UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Machine learning estimation of snow water equivalent over British Columbia Snauffer, Andrew Matthew 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2018_february_snauffer_andrew.pdf [ 15MB ]
JSON: 24-1.0363029.json
JSON-LD: 24-1.0363029-ld.json
RDF/XML (Pretty): 24-1.0363029-rdf.xml
RDF/JSON: 24-1.0363029-rdf.json
Turtle: 24-1.0363029-turtle.txt
N-Triples: 24-1.0363029-rdf-ntriples.txt
Original Record: 24-1.0363029-source.json
Full Text

Full Text

Machine Learning Estimation of Snow Water Equivalentover British ColumbiabyAndrew Matthew SnaufferM.S. Environmental Engineering, Michigan Technological University, 2008B.S. Nuclear Engineering, The Pennsylvania State University, 1996A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Geological Engineering)The University of British Columbia(Vancouver)December 2017© Andrew Matthew Snauffer, 2017AbstractIn regions with hydrologic regimes dominated by spring snowmelt, reliable waterresources forecasting and planning require accurate estimates of snow water equiv-alent (SWE). While various gridded data products can provide such estimates, theyare especially challenging under conditions of mountainous terrain, heavy forestcover and large snow accumulations. These conditions aptly describe the provinceof British Columbia (BC), Canada. SWE values from reanalyses, land data as-similation systems and observation-based data sets were compared with manualsnow surveys over the five physiographic regions of BC. Yearly time series weregenerated for each survey month (January through June), and correlation, bias andmean absolute error (MAE) were found for each product and physiographic re-gion. Better SWE estimates were seen in regions of lower snow accumulationand land relief, and a product performance ranking found three products to be thebest: ERA-Interim/Land, GLDAS-2 and MERRA. Using the snow surveys as tar-get data, combinations of the gridded SWE products and primary spatiotemporalcovariates (survey date, year, latitude, longitude, elevation and grid cell elevationdifferences) were incorporated into multiple linear regression (MLR) and artificialneural network (ANN) models. The ANN using the best three products was foundto have the lowest overall MAE compared to other products and models. Thisbase ANN was then enhanced to include terrain covariates (slope, aspect and Ter-rain Ruggedness Index) and a simple one-layer energy balance snow model drivenby bias-corrected ANUSPLIN temperatures and precipitations. Lagged predictorvalues compensated for early snow model melt off. These enhancements furtherimproved mean station MAE by 10% and correlations by 0.04. The ANN modelswere also compared with the Variable Infiltration Capacity model calibrated foriifour BC watersheds. While the base ANN produced statistically significant MAEreductions in one region and correlation increases in two regions, the enhancedANN achieved larger improvements that were significant in three and all five re-gions, respectively. These results demonstrate the promise of machine learningin fusing complementary information from sophisticated lower-resolution griddedproducts and a simple but higher-resolution snow model, improving SWE estima-tion in challenging contexts.iiiLay SummaryUnderstanding the amounts of snow received throughout the year is important forwater planning purposes, especially in areas that get a significant portion of theiryearly water supply from snowmelt. British Columbia’s many mountains, denseforests and high snow accumulations make this an especially challenging task inthis province. The goal of this work is to take existing large-scale data sets andcombine them with local information and actual snow measurements using a ma-chine learning model. The model learns the relationships between the inputs andthe measurements and can then make estimates throughout the province. Thesenew estimates are compared with the measurements and shown to be on averagesignificantly better in both amounts and year-to-year changes. This suggests thatthe machine learning approach can be used to get better estimates of snow in chal-lenging mountainous areas.ivPrefaceThe following details the contributions to and publications arising from this work.Chapter 1. Figure 1.1 was used with permission from the source. WilliamHsieh provided this figure and equations and suggested improvements to the de-scriptions of the perceptron model. Other descriptions including those of the snowdata and gridded products are my original work.Chapter 2. A version of this chapter was published as Snauffer et al. [2016].All content is my original work, written with the guidance of my coauthors WilliamHsieh and Alex Cannon.Chapter 3. A version of this chapter was published as a discussion articlein The Cryosphere Discussions as Snauffer et al. [2017] and at the time of the-sis submission has been accepted for publication as a peer-reviewed manuscriptin The Cryosphere. Markus Schnorbus provided input on known issues with theVariable Infiltration Capacity model and on appropriate spatial breakout schemes.William Hsieh suggested model configurations such as hidden node steps and pro-vided overall guidance and direction. Alex Cannon provided the template callingfunction for the R package monmlp as well as an improved description of thispackage in the text, plus overall guidance and direction. The remaining content ismy original work.Chapter 4. This chapter is planned for submission for publication at the time ofthesis submission. R. Dan Moore gave extensive comments on the text and sugges-tions on selection and use of statistics. Alex Cannon provided the bias-correctedand regridded ANUSPLIN temperature and precipitation data set. William Hsiehclosely reviewed all figures and provided overall guidance and direction. The finalcontent is my original work.vChapter 5. William Hsieh provided comments and a final summary point onthe abilities of machine learning. All remaining content is my original work.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Snow Water Equivalent Measurement . . . . . . . . . . . . . . . 51.2.1 Manual Snow Surveys . . . . . . . . . . . . . . . . . . . 51.2.2 Automated Snow Weather Stations . . . . . . . . . . . . . 51.3 Gridded Products . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Research Goal and Objectives . . . . . . . . . . . . . . . . . . . 82 Gridded Product Comparison . . . . . . . . . . . . . . . . . . . . . 102.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10vii2.2 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . 202.3.2 Assignment of Grid Cells and Construction of Time Series 252.3.3 Performance Scores . . . . . . . . . . . . . . . . . . . . 262.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.1 Seasonal Performance Across BC . . . . . . . . . . . . . 282.4.2 Physiographic Region Characterizations . . . . . . . . . . 322.4.3 Elevation Dependence . . . . . . . . . . . . . . . . . . . 342.4.4 Ranking of Gridded Products . . . . . . . . . . . . . . . . 352.4.5 Confidence Intervals . . . . . . . . . . . . . . . . . . . . 372.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . 383 Data Fusion of Gridded Products . . . . . . . . . . . . . . . . . . . . 423.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.1 Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.2 Artificial Neural Network Construction . . . . . . . . . . 483.3.3 Hydrologic Model Comparison . . . . . . . . . . . . . . 533.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 Enhancements with Snow Modeling and Land Covariates . . . . . . 684.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . 734.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 834.3.1 Enhanced ANN Comparison . . . . . . . . . . . . . . . . 834.3.2 MAE and Correlation Values and Differences . . . . . . . 864.3.3 Regional MAE and Correlation Differences . . . . . . . . 884.3.4 SWE Maps and Station Predictions . . . . . . . . . . . . 904.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93viii5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100ixList of TablesTable 2.1 Physiographic Region Summary . . . . . . . . . . . . . . . . 19Table 2.2 Gridded SWE Product Data Evaluated in this Study . . . . . . 20Table 2.3 Gridded SWE Product Median Number of Years Analyzed bySurvey Month . . . . . . . . . . . . . . . . . . . . . . . . . . 25Table 2.4 Rank Score of Gridded Products* . . . . . . . . . . . . . . . . 37Table 2.5 Statistically Significant Performance Comparisons* . . . . . . 38Table 3.1 Mean station MAEs (in mm SWE) for individual and combi-nations of products and corresponding MLR and ANN runs,with the mean station MAEs computed using only the VIC sta-tions given in parenthesis. Products shown are ERAInterim (E),GlobSnow (G), MERRALand (ML), MERRA (M), GLDAS2(G2) and ERALand (EL). . . . . . . . . . . . . . . . . . . . . 52Table 4.1 Physiographic Region Summary . . . . . . . . . . . . . . . . 76Table 4.2 Enhanced ANN Run Predictor Combinations . . . . . . . . . . 84xList of FiguresFigure 1.1 A schematic artificial neural network with one hidden layer,where xi, h j and yk denote, respectively, the input variables,hidden neurons and output variables (source: Hsieh [2009,Figure 4.4]). . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Figure 2.1 Physiographic Regions of British Columbia, Canada, overlaidon the GLOBE 1-km DEM. Elevations are shown accordingto the legend with units of meters above sea level. The fivephysiographic regions are outlined in red: (A) BC Plains, (B)Interior Plateau, (C) Northern and Central Plateaus and Moun-tains, (D) Columbia Mountains and Southern Rockies, and (E)Coast Mountains and Islands. SWE accumulation generallyincreases from region A to region E. Manual Snow Survey sta-tion locations are indicated by green dots. Inset: A map ofCanada (source: Wikimedia Commons) shows BC highlightedin red. The province is bordered to the south by the continentalUnited States (USA) and to the west by the US state of Alaska(AK). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 2.2 ERALand SWE maps for April 1 for the high and low years(a) 1999 and (b) 2005, respectively. The five physiographic re-gions of BC are outlined in red for reference. Grayscale shad-ing indicates mm SWE according to the legend, while whiteindicates SWE values over 1000 mm. . . . . . . . . . . . . . 17xiFigure 2.3 SWE time series for snow course stations and gridded prod-ucts in each of the 5 physiographic regions of BC: (a) station4C05 Fort Nelson Airport in the BC Plains, (b) station 2F11Isintok Lake in the Interior Plateau, (c) station 4B08 MountCronin in the Northern and Central Plateaus and Mountains,(d) station 2A25 Kirbyville Lake in the Columbia Mountainsand Southern Rockies, and (e) station 1D08 Stave Lake in theCoast Mountains and Islands. Note that MSS are point mea-surements, while gridded products are spatially distributed. . . 18Figure 2.4 Median station normalized MAEs for the April survey for the8 evaluated products: (a) GlobSnow (b) CMC (c) GLDAS1(d) GLDAS2 (e) ERAInterim (f) ERALand (g) MERRA (h)MERRALand. Normalized MAEs from 0 to 1 are representedon a color scale from white to red; values over 1 are repre-sented by dark red. . . . . . . . . . . . . . . . . . . . . . . . 29Figure 2.5 Performance of gridded products relative to manual snow sur-veys for primary survey months as measured by the mediancomputed over the whole of BC. a) Pearson correlation. b)Bias. c) Normalized bias (bias divided by mean station AprilSWE). d) Mean absolute error (MAE). e) Normalized MAE(MAE divided by mean station April SWE). . . . . . . . . . . 30Figure 2.6 Heat maps showing median statistical values by gridded prod-uct, survey month (January through June), and region: BCPlains (A), Interior Plateau (B), Northern and Central Plateausand Mountains (C), Columbia Mountains and Southern Rock-ies (D), and Coast Mountains and Islands (E). SWE accumu-lation generally increases from region A to region E. Valuesto the left of each row represent medians across all surveymonths. Boxes are blacked out when insufficient data waspresent. Statistics shown are: a) Pearson correlation. b) Biasin mm SWE. c) Normalized bias (bias divided by mean stationApril SWE). d) Mean absolute error (MAE) in mm SWE. d)Normalized MAE (MAE divided by mean station April SWE). 32xiiFigure 2.7 Bias contour plots, and percent of variance attributable to el-evation and elevation difference for (a) GlobSnow, 17% (b)CMC, 22% (c) GLDAS1, 23% (d) GLDAS2, 33% (e) ERA-Interim, 22% (f) ERALand, 38% (g) MERRA, 41% (h) MERRA-Land, 35%. Biases for the April survey are shown as contourson a plane of mean grid cell elevation difference vs. station ele-vation. Positive biases and elevation differences indicate meangrid cell values greater than those of the respective stations. . 36Figure 3.1 Physiographic Regions of British Columbia, Canada, overlaidon the GLOBE 1-km DEM. Elevations are shown according tothe legend with units of meters above sea level. The five phys-iographic regions are outlined in red: (A) Coast Mountainsand Islands, (B) Columbia Mountains and Southern Rockies,(C) Northern and Central Plateaus and Mountains, (D) Inte-rior Plateau, and (E) BC Plains. SWE accumulations generallyincrease from region A to region E. Outlined in blue is theVIC domain run by the Pacific Climate Impacts Consortium,covering four basins throughout BC: (I) the Peace, (II) upperColumbia, (III) Fraser and (IV) Campbell River watersheds.Manual Snow Survey station locations are indicated by greendots. Inset: A map of Canada (source: Wikimedia Commons)shows BC highlighted in red. The province is bordered to thesouth by the continental United States (USA) and to the westby the US state of Alaska (AK). . . . . . . . . . . . . . . . . 46Figure 3.2 MSS station locations used in evaluating combination prod-ucts. Green dots show the distribution of ten test splits forcomparing means, MLRs and ANNs to gridded SWE productsthroughout BC by cross-validation. Blue crosses show the dis-tribution of seven test splits for comparing ANNs to VIC runsin four BC watersheds. Physiographic regions are outlined inred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50xiiiFigure 3.3 SWE time series for manual snow survey stations, the bestthree gridded SWE products, MLR3 and ANN3. Gridded prod-ucts are ordered by increasing overall performance, with darkerblues for better performing products. MLR3 and ANN3 esti-mates are masked outside the mean first and final survey dates(January 2 and June 15 respectively). A representative stationis shown for each of the five physiographic regions of BC: (a)station 1D08 Stave Lake in the Coast Mountains and Islands,(b) station 2A25 Kirbyville Lake in the Columbia Mountainsand Southern Rockies, (c) station 4B08 Mount Cronin in theNorthern and Central Plateaus and Mountains, (d) station 2F11Isintok Lake in the Interior Plateau, and (e) station 4C05 FortNelson Airport in the BC Plains. . . . . . . . . . . . . . . . . 57Figure 3.4 Mean station MAE for several SWE products/combinations forregions of BC in order of descending mean accumulations: (a)Coast Mountains and Islands, (b) Columbia Mountains andSouthern Rockies, (c) Northern Plateaus and Mountains, (d)Interior Plateau, (e) BC Plains, and (f) province-wide (combin-ing all data from the five preceding panels). Shown are MAEswith 95% confidence intervals based on n = 5000 bootstrapsamples for the six gridded products in blue: ERA-Interim (E),GlobSnow (GS), MERRALand (ML), MERRA (M), GLDAS2(G2) and ERALand (EL). Dark blue indicates three best per-forming products. Also shown are three fusion techniques(mean, MLR and ANN) using all six products (green) and thethree best performing products (brown). Note the vertical scalein panels (a) and (b) is double that of panels (c) through (f). . 58xivFigure 3.5 Mean station April Correlation for several SWE products/com-binations for regions of BC in order of descending mean ac-cumulations: (a) Coast Mountains and Islands, (b) ColumbiaMountains and Southern Rockies, (c) Northern Plateaus andMountains, (d) Interior Plateau, (e) BC Plains, and (f) province-wide. Product and combination abbreviations and colors are asin Figure 3.4. . . . . . . . . . . . . . . . . . . . . . . . . . . 59Figure 3.6 Bootstrap mean station MAE differences between several SWEproducts/combinations and the reference (the mean of six prod-ucts). First quartile, median and third quartile, plotted respec-tively as the bottom, waist and top of each box, as well as 95%confidence intervals indicated by whiskers, are shown for re-gions of BC in order of descending mean accumulations: (a)Coast Mountains and Islands, (b) Columbia Mountains andSouthern Rockies, (c) Northern Plateaus and Mountains, (d)Interior Plateau, (e) BC Plains, and (f) province-wide. Productand combination abbreviations and colors are as in Figure 3.4.Note differences in vertical scale. . . . . . . . . . . . . . . . 61Figure 3.7 Bootstrap mean station April Correlation differences betweenseveral SWE products/combinations and the reference (the meanof six products). Quartiles and 95% confidence intervals areshown for regions of BC in order of descending mean accumu-lations: (a) Coast Mountains and Islands, (b) Columbia Moun-tains and Southern Rockies, (c) Northern Plateaus and Moun-tains, (d) Interior Plateau, (e) BC Plains, and (f) province-wide.Product and combination abbreviations and colors are as inFigure 3.4. Note differences in vertical scale. . . . . . . . . . 62xvFigure 3.8 Box plots of bootstrap station differences in MAEs and Aprilcorrelations showing quartiles and 95% confidence intervalsfor ANNs of six products, ANN6 (green), and of the best threeproducts, ANN3 (red), with respect to MLR6 and MLR3 re-spectively. Bootstrap difference distributions are shown for (a)ANN6−MLR6 MAEs, (b) ANN3−MLR3 MAEs, (c) ANN6−MLR6April correlations and (d) ANN3−MLR3 April correlations.Shown are the differences for each physiographic region: CoastMountains and Islands (CM), Columbia Mountains and South-ern Rockies (CR), Northern and Central Plateaus and Moun-tains (NP), the Interior Plateau (IP), and the BC Plains (BP),as well as province-wide (BC). . . . . . . . . . . . . . . . . . 64Figure 3.9 Box plots of bootstrap station differences in MAEs and Aprilcorrelations showing quartiles and 95% confidence intervalsfor ANNs of six products, ANN6 (purple), and of the bestthree products, ANN3 (orange), with respect to VIC. Boot-strap difference distributions are shown for (a) ANN6−VICMAEs, (b) ANN3−VIC MAEs, (c) ANN6−VIC April corre-lations and (d) ANN3−VIC April correlations. Notation forthe physiographic regions follows Figure 3.8. . . . . . . . . . 65xviFigure 4.1 Physiographic Regions of British Columbia, Canada, overlaidon the GLOBE 1-km DEM displayed with a standard terraincolor map. The five physiographic regions outlined in redare: (A) Coast Mountains and Islands, (B) Columbia Moun-tains and Southern Rockies, (C) Northern and Central Plateausand Mountains, (D) Interior Plateau, and (E) BC Plains. MeanSWE accumulations increase from region A to region E. Out-lined in blue is the domain covered by the VIC model as runby the Pacific Climate Impacts Consortium, which covers fourbasins throughout BC: (I) the Peace, (II) upper Columbia, (III)Fraser and (IV) Campbell River watersheds. Manual snow sur-vey station locations are indicated by purple dots. Inset: Amap of Canada adapted from Wikimedia Commons highlightsin red the province of BC, bordered to the south by the conti-nental United States (USA) and to the west by the US state ofAlaska (AK). . . . . . . . . . . . . . . . . . . . . . . . . . . 75Figure 4.2 SWE traces for the 1999 snow season, including those of snowpillows and the SnowMelt model with lags of 0, 15 and 30days. Mean SWE values of gridded products MERRA, GLDAS2and ERALand correspond to the dates of manual snow sur-veys (MSS). The following locations are shown: (a) station1B02 Tahtsa Lake in the Coast Mountains and Islands, (b) sta-tion 2C14 Floe Lake in the Columbia Mountains and SouthernRockies, (c) station 4A02 Pine Pass in the Northern Plateausand Mountains, and (d) station 4B15 Lu Lake in the InteriorPlateau. No automated snow pillows are located in the BCPlains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82xviiFigure 4.3 Performance comparisons of ANNs using various combina-tions of snow model lags and topographic covariates with re-spect to the base ANN (gray bars) across the province of BC.Covariates included in each combination are given in Table4.2. For each of the given combinations, mean station (a)MAEs and (b) MAE differences with respect to the base ANNas well as mean station April (c) correlations and (d) corre-lation differences with respect to the base ANN are shown.ANN SM2, highlighted in bold, was selected as the enhancednetwork for further evaluation. . . . . . . . . . . . . . . . . . 85Figure 4.4 Performance comparisons of the best three gridded SWE prod-ucts (MERRA, GLDAS2 and ERALand), the means of thoseproducts (Mean3), the ANUSPLIN-driven EcoHydRology SnowMeltmodel (SnowMelt), the enhanced three-product MLR (MLR3e)and the Variable Infiltration Capacity model (VIC) with theenhanced three-product ANN (ANN3e) at stations within theVIC domain. Mean station (a) MAEs and (b) MAE differenceswith respect to ANN3e as well as mean station April (c) corre-lations and (d) correlation differences with respect to ANN3eare shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Figure 4.5 Regional comparison of the various products and models rel-ative to ANN3e over the VIC domain. Mean station (a) MAEdifferences and (b) April correlation differences with respectto ANN3e are shown for the gridded SWE products and theirmeans, ANNbase, the ANUSPLIN-driven SnowMelt modelresults, MLR, and VIC. . . . . . . . . . . . . . . . . . . . . . 89Figure 4.6 SWE maps for British Columbia for April 1, 1999 for (a) MERRA,(b) GLDAS2, (c) ERALand and (d) ANUSPLIN-driven SnowMeltmodel results. Physiographic regions are outlined in red. Landand water masks were created using physiographic region bor-ders [Holland, 1964] and the GLOBE 1-km DEM [Hastingset al., 1999]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91xviiiFigure 4.7 SWE map of British Columbia for April 1, 1999 generated byANN3e. (a) SWE map shows physiographic regions (red out-lines) and locations of five MSS stations (purple circles) withletters that correspond to the time series that follow. TheseMSS SWE values and traces from MERRA, GLDAS2, ERA-Land, MLR3e and ANN3e are shown for snow seasons 1999to 2005 for the following stations: (b) station 1D08 Stave Lake,(c) station 2A25 Kirbyville Lake, (d) station 4B08 Mount Cronin,(e) station 2F11 Isintok Lake, (f) station 4C05 Fort Nelson Air-port. MLR3e and ANN3e are masked outside January 2 andJune 15 as in Figure 3.3 . . . . . . . . . . . . . . . . . . . . 92xixGlossaryAFWA Air Force Weather Agency radiation dataANN Artificial Neural NetworkANUSPLIN Australian National University Spline interpolation packageASWS Automated Snow Weather StationsBC British ColumbiaBFGS Broyden-Fletcher-Goldfarb-Shanno optimization methodCMAP Climate Prediction Center Merged Analysis of PrecipitationCMC Canadian Meteorological CentreCPCU Climate Prediction Center “Unified” precipitation productDEM Digital Elevation ModelECMWF European Centre for Medium-Range Weather ForecastsERA European Centre for Medium-Range Weather Forecasts ReanalysisGDAS Global Data Assimilation System atmospheric analysisGLDAS Global Land Data Assimilation SystemGLOBE Global Land One-kilometer Base Elevation DEMGEM Global Environmental Multiscale forecast modelxxGEOS-5 Goddard Earth Observing System Data Assimilation System Version 5HN number of Hidden Neurons in an ANNGOES Geostationary Operational Environmental SatelliteGPCP Global Precipitation Climatology ProjectHTESSEL Hydrology-Tiled ECMWF Scheme for Surface Exchanges over LandHUT Helsinki University of Technology snow microwave emission modelIFS Integrated Forecast SystemLDAS Land Data Assimilation SystemLSM Land Surface ModelMAE Mean Absolute ErrorMASL Meters Above Sea LevelMERRA Modern-Era Retrospective analysis for Research and ApplicationsMLP Multi-Layer PerceptronMLR Multiple Linear RegressionMSS Manual Snow SurveyNOAA National Oceanic and Atmospheric AdministrationRMSE Root Mean Squared ErrorSWE Snow Water EquivalentVIC Variable Infiltration Capacity hydrologic modelxxiAcknowledgmentsFunding for this research was provided by the Canadian Snow and Sea Ice Evolu-tion (CanSISE) network, funded by NSERC of Canada. Thanks to Paul Kushnerfor leading this network, to Ross Brown and Chris Derksen for their input, and toFrancis Zwiers for hosting me at PCIC. I appreciate the efforts of Tony Litke andJessica Byers with the BC Ministry of Environment, who provided data and contextfor the BC Snow Surveys. I thank Kari Luojus, Juval Cohen and Jaakko Ikonenfor making the GlobSnow “mountains unmasked” data set available. I acknowl-edge the Global Modeling and Assimilation Office and the NASA Goddard EarthSciences Data and Information Services Center for dissemination of MERRA,MERRALand and GLDAS-1 and -2. ERAInterim and ERALand SWE data setswere downloaded from the ECMWF Public Datasets, and the CMC data were ob-tained from the National Snow and Ice Data Center. High-resolution ANUSPLINtemperature and precipitation data were provided by NRCan and bias-correctedusing climate normals from ClimateWNA.Thanks to my University Examiners, Susan Allen and Ian McKendry, and to ananonymous external examiner for taking time to review my work. I appreciate theinput and comments of Markus Schorbus and R. Dan Moore, who greatly improvedChapters 3 and 4 respectively. I offer thanks to my committee, Dan Moore, AlexCannon, and William Hsieh. A special thanks to Alex Cannon for his great insights,friendship, and taking me under his wing while at PCIC. I offer a very specialthanks to my advisor William Hsieh for his amazing attention to all my plots, forhis guidance through the years, and for taking a chance on me.Finally, I would like to thank my family for their love and encouragement.Jenny, your unconditional support meant everything to me.xxiiChapter 1IntroductionSnowpacks on mountains are important for their role in water storage. As theymelt, they supply fresh water to streams and rivers, and for hydro-electricity gen-eration. They are also associated with disasters like floods and droughts.The amount of water contained in a snowpack is measured in terms of the snowwater equivalent (SWE), which gives the depth of water if the column of snow ismelted. Accurate estimates of winter SWE are vital for hydrologic forecastingand planning especially when the annual runoff signal is dominated by seasonalsnowmelt. Numerous readily available data products provide such estimates onbasin to hemispheric scales, but all are subject to limitations, especially in regionsof complex topography.Synoptic-scale circulation patterns associated with Pacific Decadal Oscillation(PDO), El Nin˜o–Southern Oscillation (ENSO) and the Pacific North American Pat-tern (PNA) have been found to be responsible for up to 75% of mean winter tem-perature variance and 65% of winter precipitation in British Columbia (BC) [Stahlet al., 2006b]. Moore and McKendry [1996] identified positive 1966-1976 snowanomalies and negative 1977-1992 snow anomalies for southern BC or the entireprovince, results that accorded with changes in sea surface temperatures and PNAcirculation patterns. In the Columbia basin to the south of the province, Hsieh andTang [2001] found negative SWE anomalies during high PNA and El Nin˜o years,with high PNA modes exerting the most impact. Thus changes in SWE over inter-annual/interdecadal time scales present additional challenges and opportunities in1managing this important resource.In this dissertation, the problem of improving SWE estimates is approached viamachine learning.1.1 Machine LearningLong before the days of mainframes and vacuum tubes, people dreamed of intelli-gent devices [Nilsson, 2009]. Today those dreams are rapidly becoming reality inthe field of machine learning, a dominant branch of artificial intelligence. Machinelearning [Bishop, 2006, Murphy, 2012] has been applied to a wide variety of com-putational challenges, including self-driving automobiles, handwriting and speechrecognition, object recognition in computer vision, robotics and computer games,natural language processing, brain–machine interfaces, medical diagnosis, DNAclassification, search engines, spam and fraud detection, and stock market analy-sis. Machine learning has also been applied to the environmental sciences [Hsieh,2009], including hydrology [Govindaraju and Rao, 2000, Abrahart et al., 2008].Although machine learning had originally evolved with almost no interaction withstatistics, in more recent decades, many machine learning methods have been caston a probabilistic framework [Murphy, 2012], with increasing interaction betweenmachine learning and statistics.Machine learning as a science has its roots in the McCulloch and Pitts [1943]model. The model described binary threshold units or “neurons”, which output oneor zero depending on whether the weighted sum of inputs from other units is moreor less than a threshold, and hence whether or not it is activated. Networks of theseneurons were shown to be capable of performing calculations, though there was nomechanism for finding weight and offset parameters.The perceptron model pioneered by [Rosenblatt, 1958, 1962] and [Widrow andHoff, 1960] was the next significant advancement. Consisting of an input layer ofneurons (or nodes) connected to an output layer, the perceptron model incorporateda learning algorithm for finding weight and offset parameters. The addition ofhidden layers of neurons between the input and output layers led to the multi-layerperceptron (MLP) model. With one hidden layer (Figure 1.1), the MLP model2maps the input signals xi to the hidden neurons h j byh j = f (∑iw jixi+b j) , (1.1)where f is called an activation function, and the hyperbolic tangent function is acommon choice for f . The model output isyk = g(∑jw˜k jh j + b˜k) , (1.2)where the activation function g is commonly chosen to be the identity function fornonlinear regression problems, so yk is simply a linear combination of the hiddenneurons h j. The parameters w ji and w˜k j are called weights, and b j and b˜k are calledoffset or bias parameters.MLP training uses nonlinear minimization of the objective or cost function J tofind the optimal values of the weight and offset parameters. J is commonly basedon the mean squared error (MSE) between the model output yk and the target dataydk,J =1NN∑n=1{12∑k[y(n)k − y(n)dk ]2}, (1.3)where N is the number of observations. Network parameters were not solvable un-til Rumelhart et al. [1986] applied the back-propagation algorithm, actually derivedearlier by Werbos [1974]. The original method for minimizing the objective func-tion, known as gradient descent, computes the gradient of the objective function bybackward propagation of model errors and steps along this gradient toward a mini-mum, a very slow process. Improvements in the speed of convergence were seen bythe development of the conjugate gradient algorithm [Fletcher and Reeves, 1964],which extended the linear conjugate gradient developed by Hestenes and Stiefel[1952] to nonlinear problems, and quasi-Newton methods, such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method later enhanced by Luenberger [1984].As the objective function J has multiple local minima, the nonlinear optimiza-tion algorithm starting with different random initial weights and offset parameterscould converge to different local minima. Hence it is common to train an ensembleof MLP models with different random initializations. The output of the individual3xiykhjInput layer Hidden layerOutput layerFigure 1.1: A schematic artificial neural network with one hidden layer, where xi,h j and yk denote, respectively, the input variables, hidden neurons and outputvariables (source: Hsieh [2009, Figure 4.4]).ensemble members is then simply averaged.With enough hidden neurons, the MLP model can fit a data signal includingthe noise with arbitrary accuracy, which can lead to overfitting. To prevent this, thedata are often divided into training and validation sets. While continued trainingleads to lower error in the training set, the error in the validation set will drop butthen eventually increase. The technique of halting the training when the validationerror reaches a minimum is called early stopping and is commonly used with MLPneural networks [Hsieh, 2009].The number of layers of hidden neurons can also be adjusted. While a sin-gle hidden layer is adequate to model any nonlinear continuous function [Bishop,2006], very complex representations may benefit from additional layers. Manyreadily available MLP packages such as the R package monmlp [Cannon, 2015]4are limited to one or two hidden layers. The addition of many hidden layers allowsthe MLP model to learn highly complex mapping relationships. In this way MLPhas evolved into so-called “deep learning” [Goodfellow et al., 2016], now at theforefront of modern machine learning research.1.2 Snow Water Equivalent Measurement1.2.1 Manual Snow SurveysManual snow surveys (MSS) are conducted by the BC Ministry of Environment’sSnow Survey Program. In situ measurements of snow depth and SWE throughoutthe province are taken through this program, the guiding principle of which ismeasurement consistency. Survey sites, which are typically located between 1000and 2000 meters above sea level (masl), are sampled year after year. Five to tenpoints are chosen at each site within an area that is ideally sheltered with littleoverhead canopy coverage. Measurements are made with a Standard Federal SnowSampler, an aluminum tube with a cutter bit affixed to the end. The tube is poundedinto the snowpack and used to extract a core of snow, which is measured for lengthand weighed using a scale calibrated in centimeters of water equivalent depth.The timing of the measurements is the beginning of the month January throughJune, plus midmonth May and June, within a window of ±5 days. Not all sites aremeasured at each survey interval. The beginning of April sees the most surveystaken throughout the year, since this is the time of the largest accumulations formost sites. Lower elevation sites see little to no snow late in the season, so thesesites are not scheduled for sampling. Records for some sites go back as far as 1935,though sites have been dropped and added throughout the years. Of 470 total sitesin the Snow Survey database, 256 have at least ten years of measurements from1980 to 2010.1.2.2 Automated Snow Weather StationsAutomated Snow Weather Stations (ASWS) are monitoring stations located through-out the province which contain basic meteorological instruments and a device tomeasure SWE, either a snow scale or snow pillow. The ASWS network is made up5of stations operated by the Ministry of Environment, BC Hydro, Rio Tinto Alcanand the Greater Vancouver Water District. Of the 71 stations that have ever beenset up, 51 are currently operational.Snow scales and pillows determine SWE by recording the height of antifreezesolution pushed up a standpipe by the weight of the snow. Stations also includetemperature sensors, precipitation gauges and snow depth sensors, which measurethe distance to the snow surface from a fixed point above using sensors similarto those in autofocus cameras. The measurements are transmitted hourly to theGeostationary Operational Environmental Satellite (GOES) system for relay backto the BC River Forecast Centre, which computes and published daily SWE data.Data from the ASWS (as well as those from the MSS) are available through the BCRiver Forecast Centre website (https://www2.gov.bc.ca/gov/content/environment/air-land-water/water/drought-flooding-dikes-dams/river-forecast-centre, updated asof December 31, 2017).1.3 Gridded ProductsNumerous readily available data products provide continuous gridded SWE esti-mates. These products vary in temporal and spatial extent, but they generally of-fer at a minimum hemisphere-wide to global coverage throughout the satellite era(i.e. 1980-present). Broadly speaking, these products can be grouped into severalclasses.Reanalysis products rely on the integration of large and varied data sets of earthobservations. These observations can include data from satellites, meteorologicalstations, aircraft, radiosondes, ships and buoys. The key to reanalyses is ingest-ing of climate data using an unchanging data assimilation scheme. Two state-of-the-art reanalyses are widely used. The European Centre for Medium-RangeWeather Forecasts (ECMWF) Reanalysis ERA-Interim [Dee et al., 2011] uses itsIntegrated Forecast System (IFS) to incorporate state measurements of land, seaand air. Similarly, the Modern-Era Retrospective analysis for Research and Ap-plications, or MERRA [Rienecker et al., 2011], relies on the Goddard Earth Ob-serving System Data Assimilation System Version 5 (GEOS-5) for assimilating ahost of observational data. These two reanalyses also have associated off-line re-6plays with updated land surface models (LSMs) and precipitation adjustments thatseek to improve the representation of land surface states. ERA-Interim/Land [Bal-samo et al., 2015], hereinafter identified as ERALand, uses a recent version of theland surface scheme HTESSEL (Hydrology-Tiled ECMWF Scheme for SurfaceExchanges over Land) and precipitation adjustments based on the Global Precip-itation Climatology Project (GPCP) v2.1. MERRA-Land [Reichle et al., 2011]utilizes the Fortuna-2.5 LSM with the NOAA Climate Prediction Center “Unified”(CPCU) precipitation product.Another key class of available gridded SWE data is the land data assimila-tion system (LDAS), which represents the land surface using observational and/orreanalysis forcings in a LSM. NASA’s Global Land Data Assimilation System(GLDAS) [Rodell et al., 2004] applies meteorological forcing data sets to one offour possible LSMs: Common Land Model (CLM) V2.0, Mosaic, Noah Model3.3 and the Variable Infiltration Capacity (VIC) model. GLDAS1 relies on threedifferent forcing data sets: the Global Data Assimilation System (GDAS) atmo-spheric analysis fields, disaggregated Climate Prediction Center Merged Analysisof Precipitation (CMAP) fields, and Air Force Weather Agency (AFWA) radiationvalues. GLDAS2 uses only the Princeton meteorological dataset [Sheffield et al.,2006].A final class a gridded SWE products is based primarily on observationaldata. GlobSnow Pulliainen et al. [1999] uses the Helsinki University of Technol-ogy (HUT) snow microwave emission model, satellite-derived passive microwavebrightness temperature and a background snow depth field built by kriging inter-polation of synoptic observations. The Canadian Meteorological Centre (CMC)Daily Snow Depth Analysis Data [Brown and Brasnett, 2015] similarly utilizesinterpolated observations to create a gridded snow depth field along with a simplesnow model driven by the Global Environmental Multiscale (GEM) forecast modelto create monthly SWE estimates.Further information for all of the above data products is presented in Section2. Research Goal and ObjectivesThe overall goal of this work is to improve estimates of SWE in mountainous re-gions, in particular the province of BC. Several gridded data products include con-tinuous SWE fields at a variety of spatial and temporal resolutions in BC. Manysuch products struggle to accurately represent SWE in mixed alpine regions ofcomplex topography, dense landcover and very high snow accumulations, condi-tions that aptly describe BC. The first objective of this work is to compare eightselected gridded SWE products with in situ SWE measurements. Correlations, bi-ases and mean absolute errors (MAEs) are generated for each product based onmanual snow surveys. Biases and MAEs are further normalized to make compar-isons independent of the magnitude of SWE accumulation. Gridded SWE productsare ranked by overall performance across the province.A second objective is to determine which combination(s) of gridded SWE prod-ucts in a machine learning algorithm are most effective in estimating SWE. Six ofthe ranked products have temporal coverage through the satellite era (1980-2010):ERA-Interim [Dee et al., 2011], ERALand [Balsamo et al., 2015], MERRA [Rie-necker et al., 2011], MERRALand [Reichle et al., 2011], GLDAS2 [Rodell et al.,2004] and GlobSnow Pulliainen et al. [1999]. Each of six products is run in anANN with a base set of site covariates but in isolation from the other products.All combinations of the top three products as well as combinations of the top fourthrough six products are further run with the same covariates. Mean station corre-lations and MAEs of the ANNs, determined using cross-validation, are comparedwith those of the original gridded products and their means, likewise configuredmultiple linear regression (MLR) models and the calibrated Variable InfiltrationCapacity (VIC) model [Pacific Climate Impacts Consortium, 2014].The third objective is to improve the best performing machine learning modelbased on a data fusion of gridded SWE products by incorporating additional datasources. Such data include enhanced terrain covariates and modeled SWEs from asimple one-layer snow model. Enhanced terrain covariates are calculated using theGLOBE 1-km DEM [Hastings et al., 1999]. The SnowMelt snow model [Walteret al., 2005] is driven by gridded temperature and precipitation fields [Hopkinsonet al., 2011, McKenney et al., 2011] bias-corrected to climate normals [Wang et al.,82012]. Finally, the best performing ANN is used to generated a high-resolutionSWE map for the entire province of BC. The three objectives are pursued sequen-tially in Chapters 2, 3 and 4, respectively.9Chapter 2Gridded Product Comparison2.1 IntroductionWinter snow is the primary source of moisture storage over much of western NorthAmerica in winter and spring [Wood and Lettenmaier, 2006]. Snow depth is im-portant in both numerical weather prediction and climate modeling applicationsas it controls the ground surface thermal insulation and hence the strength of theland-atmosphere coupling, providing lower atmosphere boundary conditions oversnow-covered regions [Dutra et al., 2012]. Snow water equivalent (SWE), definedas the depth of a layer of water having the same mass and area, is more useful forhydrologists than snow depth because it represents the amount of water available tothe ecosystem once the snow melts. Snow density, which can be used to find SWEfrom snow depth, has been shown to vary from 50 kg m-3 for new snow at lowair temperatures [Cordisco et al., 2006] to over 550 kg m-3 for ripened snowpacks[Anderton et al., 2004]. However, a recent study [Sturm et al., 2010] using a setof 25,668 snow measurements from three countries and two continents found that50% of all bulk density measurements for seasonal snow lie in a range between244 and 375 kg m-3, and 80% lie between 194 and 435 kg m-3. While there are nu-merous ways to measure site-specific SWE, finding accurate distributions of SWEon a regional scale can be a challenging task. Over large regions, remote sensingof microwave radiation signatures has been used to infer SWE [Chang et al., 1987,Tong et al., 2010, Kongoli et al., 2004].10While the science behind remote microwave sensing of snowpack has beendeveloping for many years, several environmental factors have limited increases inthe accuracy of the methods. One of the most challenging is forests, still coveringmuch of British Columbia (BC), which present two-fold difficulties. Plants area strong absorber and block much of the upwelling microwave radiation in boththe 19 and 37 GHz bands [Langlois et al., 2011]. Forests also exhibit a stronginfluence on snow distributions by canopy interception, radiation balance changesand wind speed reductions, resulting in reductions in accumulation by up to 40%and ablation by up to 70% relative to nearby open areas [Varhola et al., 2010], andmaking forest cover the most highly correlated variable for snow accumulation andmelting.Following land cover, elevation is generally the second best predictor of SWE[Varhola et al., 2010]. Sites at higher altitudes experience lower temperatures, re-sulting in earlier starts to the snow season as well as later melting than comparablesites at lower elevations. The effect of land cover is itself affected by elevation,with larger differences in melting between forests and clearcuts seen at higher alti-tudes. Topography can also impede remote sensing of the snowpack by enhancingredistribution (i.e. falling or blowing from higher peaks into deeper valleys), com-plicating SWE retrieval algorithms. Previous studies of topographic controls on theredistribution of snow by wind [Anderton et al., 2004] further highlight the con-nection and at times complex interplay between topography and land cover, bothof which are also independent predictors of SWE.Beyond microwave retrieval algorithms, numerous tools have been developedto make large-scale estimates of SWE. Land Data Assimilation Systems (LDAS)represent the land surface using ground-based observational data products, re-motely sensed products and/or reanalysis forcings in a land surface model (LSM).NASA’s Global Land Data Assimilation System (GLDAS) runs on one of fourLSMs with two possible meteorological forcings [Rodell et al., 2004]. GLDAS1uses a combination of three meteorological forcing datasets for 1979 to present,while GLDAS2 is forced only by the Princeton meteorological dataset [Sheffieldet al., 2006] for the period 1948-2010.Reanalysis products ingest a large number of earth observations over a long his-torical period into a numerical weather prediction model using an unchanging data11assimilation scheme to produce a consistent climatological record. The EuropeanCentre for Medium-Range Weather Forecasts (ECMWF) Reanalysis ERA-Interim[Dee et al., 2011] uses the Integrated Forecast System (IFS) to assimilate land sur-face, oceanographic, atmospheric and spaceborne measurements from a host ofsources. Similar data sources are assimilated using the Goddard Earth ObservingSystem Data Assimilation System Version 5 (GEOS-5) to produce the Modern-Era Retrospective analysis for Research and Applications, or MERRA [Rieneckeret al., 2011].A special class of reanalysis products focuses on improving the representationof land surface processes. A strategy for doing this is to run an offline rerun ofjust the land processes with improved land surface models and/or forcings. Suchproducts exist for both ERA-Interim and MERRA. ERA-Interim/Land [Balsamoet al., 2015], hereinafter referred to as ERALand, utilizes both the improved landsurface scheme HTESSEL (Hydrology-Tiled ECMWF Scheme for Surface Ex-changes over Land) and precipitation adjustments based on the Global PrecipitationClimatology Project (GPCP) v2.1. MERRA-Land [Reichle et al., 2011] likewiseupdates the land surface model to Fortuna-2.5 and incorporates the NOAA ClimatePrediction Center “Unified” (CPCU) precipitation product.Finally a number of gridded products are derived from observational datasets.Two such datasets are GlobSnow [Pulliainen, 2006], which bases its SWE prod-uct on the combination of satellite-based microwave radiometer and ground-basedweather station data, and the Canadian Meteorological Centre (CMC) Daily SnowDepth Analysis [Brown and Brasnett, 2015], which starts with a first guess fieldand then builds gridded snow output from observations using optimal interpola-tion. Further details on all these products are given in the description of methods.Inter-dataset spreads have been previously characterized for these datasets forthe entire Northern Hemisphere [Mudryk et al., 2015]. The datasets were foundto exhibit higher consistency over boreal forest vs. alpine regions. Much of BC,however, is mountainous, particularly along the coast, throughout the north, and inthe southeast. Therefore, an evaluation of these products over BC may give insightinto their relative performance in this topographically challenging area.The objective of this study is to compare various gridded products with in situsnow survey measurements over five physiographic regions of BC. The compari-12son of point measurements to areal averages provided by the gridded products isclearly imperfect. Single location snow measurements cannot appropriately repre-sent entire cells when applied to coarse grid SWE datasets [Mudryk et al., 2015].Given the sparseness of measurements, it is not possible to fully capture subgridspatial variability. SWE is strongly influenced by topographic controls and in par-ticular elevation. Differences between station altitude and mean grid cell elevationcan lead to over- or underestimation, and thus some level of irreducible error. Thiscaveat aside, correlations computed using interannual time series can show abilityto reproduce interannual deviations. Furthermore, seasonal trends in bias and MAEwould be largely unaffected by the absolute values of these measures. Therefore,the comparisons employed in this work can yield some insight into the performanceof the products, issues of scale notwithstanding.The general goal of the effort is to determine which products best representboth the magnitude and the interannual variability in BC. Such a result will alsoprove useful to efforts to utilize one or more of the evaluated products such as ina data fusion project using multiple predictors to generate a large-scale SWE field.Furthermore, evaluating how well a range of snow products representing multiplemodel classes (reanalyses, LDAS and observation-based) estimate SWE in thisparticularly challenging region will yield insight into context-specific performanceand more generally provide guidance on the use of such datasets in mountainous,forested regions.2.2 Study AreaBritish Columbia is Canadian province on the west coast of North America situatednorth of the continental United States and south of the US state of Alaska. It is alarge and diverse province 95.2 million hectares in area. BC stretches from 48°Nto 60°N and 114°W to 140°W. Prominent features include a jagged coastline withmany islands and several large mountain chains running approximately parallel tothe coast. Much of the province is cool, moist and forested, though Mediterranean,semi-arid, subarctic and alpine climates are also found [British Columbia Ministryof Forests, 1995]. Mountainous areas in particular receive large amounts of wintersnow, generating yearly freshets that can lead to extensive lowland flooding but13which also supply much of the water demand during the dry summer months.British Columbia can be separated into five physiographic regions: the CoastMountains and Islands, Interior Plateau, Columbia Mountains and Southern Rock-ies, Northern and Central Plateaus and Mountains, and Interior Plains. These re-gions, overlaid on a digital elevation model [Hastings et al., 1999], are shown inFigure 2.1, and Table 2.1 compares region characteristics including area, mean ele-vation and mean Terrain Ruggedness Index, TRI, as defined by Riley et al. [1999].The number of stations and mean MSS SWE values by survey month for 1980-2009 are also shown. Comparing the SWE values, April is found to be the monthof highest mean accumulation across all regions and hence will be used as the timeof mean peak SWE throughout.The Interior Plains region of BC, hereafter referred to as the “BC Plains”, oc-cupies the northeast corner of the province. It is the region of lowest relief andmean elevation, 642 masl (meters above sea level). Separated from the coast byhundreds of kilometers, this region receives the least snow of all physiographic re-gions, with a mean April SWE of 88 mm. It is the smallest region by area (10.6million hectares), and though there are only 5 snow stations here, the flat topog-raphy and inland location mean that high snow variability is not characteristic ofthis region, although wind drifting can generate high variability in low-relief land-scapes.Situated in the middle of the province, the Interior Plateau has twice the areaand elevation of the BC Plains. Except for major rivers and the highlands on theeastern margin, the topography is still relatively flat. It is the second lowest regionof accumulation based on an average April snow survey value of 283 mm, and ithas the most snow stations of all physiographic regions.The Northern and Central Plateaus and Mountains is the largest region in theprovince at 28.6 million hectares. The plateaus to the north have a flat to rollingtopography, while the mountain ranges are lower than those of other regions. Over-all it is a moderately rugged region and receives intermediate snow accumulationrelative to other regions, with a mean measured April SWE value of 337 mm.The Columbia Mountains and Southern Rockies in the southeast has the high-est mean elevation of all regions (1599 masl). Several rugged mountain belts andsteep-sided valleys comprise this region. It has the second highest snow accumu-14Figure 2.1: Physiographic Regions of British Columbia, Canada, overlaid on theGLOBE 1-km DEM. Elevations are shown according to the legend with units ofmeters above sea level. The five physiographic regions are outlined in red: (A)BC Plains, (B) Interior Plateau, (C) Northern and Central Plateaus and Moun-tains, (D) Columbia Mountains and Southern Rockies, and (E) Coast Mountainsand Islands. SWE accumulation generally increases from region A to region E.Manual Snow Survey station locations are indicated by green dots. Inset: Amap of Canada (source: Wikimedia Commons) shows BC highlighted in red.The province is bordered to the south by the continental United States (USA)and to the west by the US state of Alaska (AK).15lation in the province with an average of 565 mm SWE measured during Aprilsurveys, as well as the highest average station density (70 stations per 11.3 millionhectares).The Coast Mountains and Islands lines the jagged coastline of the Pacific Oceanand spans the entire north-south extent of the province. It is comprised of twomajor parallel mountain belts and a submerged coastal trough. This highly ruggedregion has the greatest snow accumulations in the province. April mean SWE for1980-2009 at stations here is 782 mm.A glimpse of SWE spatial patterns for these regions is presented in Figure 2.2.ERALand April 1 SWE maps are shown for high snow year 1999 (April meanSWE 738 mm) and low snow year 2005 (April mean SWE 345 mm). During thehigh snow year, large areas of high accumulations are particularly noticeable in thesouthern part of the Coast Mountains and Islands. Conversely the Interior Plateauis nearly snow free in April during the low snow year. Considering time series ofobserved SWE during this same period (Figure 2.3) shows the difference in theseyears is particularly prominent in regions of higher accumulation. While most ofthe gridded products do a fairly good job of representing lower accumulations ofSWE (Figure 2.3a), they all struggle to capture very high accumulations (Figure2.3e). Spatiotemporal patterns such as these will be analyzed by generation ofyearly time series and determining median statistics by region.16Longitude (deg W)Latitude (deg N)48505254565860140 135 130 125 120 115(a)Longitude (deg W) 140 135 130 125 120 115(b)02004006008001000Figure 2.2: ERALand SWE maps for April 1 for the high and low years (a) 1999 and(b) 2005, respectively. The five physiographic regions of BC are outlined in redfor reference. Grayscale shading indicates mm SWE according to the legend,while white indicates SWE values over 1000 mm.17050100150200  (a)0200400600  (b)05001000SWE (mm)  (c) MSSGlobSnowCMCGLDAS1GLDAS2ERAInterimERALandMERRAMERRALand0500100015002000  (d)1999 2000 2001 2002 2003 2004 200501000200030004000  (e)Figure 2.3: SWE time series for snow course stations and gridded products in each ofthe 5 physiographic regions of BC: (a) station 4C05 Fort Nelson Airport in theBC Plains, (b) station 2F11 Isintok Lake in the Interior Plateau, (c) station 4B08Mount Cronin in the Northern and Central Plateaus and Mountains, (d) station2A25 Kirbyville Lake in the Columbia Mountains and Southern Rockies, and(e) station 1D08 Stave Lake in the Coast Mountains and Islands. Note that MSSare point measurements, while gridded products are spatially distributed.18Table 2.1: Physiographic Region SummaryRegion Area (×106 ha) Mean TRI (m) # stations Mean SWE (mm)Jan Feb Mar Apr May JunBC Plains 10.6 67 5 49 68 84 89 15 0Interior Plateau 21.1 158 81 155 196 258 286 237 173Northern & Central Plateaus & Mountains 28.6 268 43 253 276 343 397 378 331Columbia Mountains & Southern Rockies 11.3 485 70 309 428 515 590 581 387Coast Mountains & Islands 25.3 560 57 402 523 673 840 821 637192.3 MethodsBecause there is considerable variability in the climate and topography throughoutthe region, SWE is far from a homogeneous field. To examine the performanceof the evaluated gridded products in a spatial context, it is instructive to considerthe evaluation statistics in various subdomains. Each of the five physiographicregions shown in Figure 2.1 and described in the previous section has a distinctivetopographic profile, climate context and mean SWE accumulation. These regionsthus provide a basis for such a spatial examination.2.3.1 Data SourcesIn order to evaluate the usefulness of gridded products in estimating regional-scaleSWE, values from several gridded snow products were compared with in situ man-ual snow survey measurements over province of BC. Classes of SWE gridded prod-ucts include observational and satellite-based products, reanalyses, and land dataassimilation systems. These products are summarized in Table 2.2 and describedin further detail in this section.Table 2.2: Gridded SWE Product Data Evaluated in this StudyProduct ClassSpatialResolutionTemporalRangeTemporalResolutionGlobSnow observational 25 km Dec 1979-May 2011 dailyCMC observational 24 km Aug 1998-Jun 2012 monthly*GLDAS1 LDAS 0.25° Feb 2000-Jun 2012 dailyGLDAS2 LDAS 0.25° Dec 1978-Jun 2010 dailyERAInterim reanalysis 0.75° Jan 1979-Jun 2012 dailyERALand reanalysis 0.75° Jan 1979-Dec 2010 dailyMERRA reanalysis 1/2°× 2/3° Jan 1979-Jun 2012 dailyMERRALand reanalysis 1/2°× 2/3° Jan 1980-Jun 2012 daily*Monthly values evaluated against beginning of month survey. Manual Snow Surveys (MSS)Manual snow surveys are conducted at sites throughout BC by the Snow SurveyNetwork Program for the BC River Forecast Centre. The results of these surveys20are used to estimate flood risk and water supply outlook through freshet and sum-mer. Five or ten points along a course or path within the area are sampled at eachsite using a Standard Federal Snow Sampler, which extracts a column of snow andis then weighed to determine SWE. Though the number of surveys in the historicalrecord varies by survey month and year, the locations of sites remains consistent sothat interannual comparisons are possible. The surveys are conducted up to 8 timesa year, once at beginning of months January through June and once mid-month inMay and June. Mid-month surveys are not considered in this work, meaning upto 6 surveys per year are included. Data from snow pillows at Automated SnowWeather Stations (ASWS) are also not included, as they were found to include nu-merous errors and require a concerted quality control effort. There are currently167 active and 216 inactive manual snow survey sites within the province, somewith records going back as far as 1935. For the purposes of this work, manual snowsurveys are considered the basis of comparison for evaluating gridded products. GLDASThe Global Land Data Assimilation System [Rodell et al., 2004] uses advancedland surface modeling plus data assimilation and/or reanalysis forcing to representthe land surface state. Satellite and ground observations are ingested and used todrive one of four offline land surface models (LSMs): Mosaic, CLM2, Noah, orVIC. The Noah LSM is available at the highest resolution (0.25°) and is used forthis study.Two different versions of GLDAS exist. The primary difference in these ver-sions is the application of meteorological forcings schemes. GLDAS1 uses a com-bination of Climate Prediction Center’s Merged Analysis of Precipitation (CMAP)for mean rain rate, Air Force Weather Agency (AFWA) shortwave and longwaveradiation, and other meteorological forcings from NCEP’s Global Data Assimi-lation System (GDAS), producing a high quality forcing set from mid-2001 on-ward. GLDAS2 is forced by the Princeton meteorological dataset, a reanalysisproduct that uses observational products for bias correction, and is suitable forlong term studies. Other parameters are specified as follows. Vegetation sub-gridvariability is simulated by tiling a 1-km global vegetation dataset. GLDAS1 land21cover is based on the Advanced Very High Resolution Radiometer (AVHRR), whileGLDAS2 uses the Moderate-resolution Imaging Spectroradiometer (MODIS). Clas-sification scheme is based on that used in the land surface model. GLDAS1 remapsthe various schemes to University of Maryland (UMD) vegetation types, but nosuch re-mapping is performed for GLDAS2; for instance the modified IGBP 20-category (16 IGBP vegetation classes plus 3 tundra classes) is used in the GLDAS2Noah implementation. Soils information for GLDAS1 is based on the Food andAgriculture Organization (FAO) Soil Map of the World, while for GLDAS2 soiltexture maps from the LSM developers were used, which in the case of the NoahLSM is the FAO 16-category soil texture class. Topography in both productsis defined by the GTOPO30 Global 30 Arc Second (∼1 km) Elevation Dataset.GLDAS model data were downloaded from the Goddard Earth Sciences Data andInformation Services Center (GES DISC) on July 2, 2014 at the following URL:http://disc.sci.gsfc.nasa.gov/hydrology/data-holdings. ERA-Interim and ERA-Interim/LandThe European Centre for Medium-Range Weather Forecasts (ECMWF) Reanaly-sis ERA-Interim [Dee et al., 2011] is a global reanalysis product covering 1979 topresent. Data from a wide range of land surface, oceanographic, atmospheric andspaceborne measurement systems are assimilated by the Integrated Forecast Sys-tem (IFS). The IFS version used to produce ERA-Interim (Cy31r2) was released in2006. Atmospheric and surface parameters are represented globally at a resolutionof approximately 80 km (T255 spectral) at 60 vertical levels. Atmospheric fieldsare available at 6 hour intervals with vertical integrals available daily, while sur-face fields are produced every 3 hours. Monthly averages are available for surface,single level and model level fields, as well as forecast parameters for the four mainsynoptic hours: 0, 6, 12 and 18 UTC.ERA-Interim/Land [Balsamo et al., 2015] is an offline land surface simulationproduced with the latest ECMWF land surface model and driven by ERA-Interimmeteorological forcings. The Hydrology-Tiled ECMWF Scheme for Surface Ex-changes over Land (HTESSEL) land surface model includes a number of enhance-ments to the TESSEL scheme used in ERA-Interim. These enhancements include22the use of globally varying soil textures, an improved runoff representation, a newsnow scheme, a vegetation climatology based on satellite measurements, and anevaporation formulation using a lower stress threshold for bare soil. Precipitationadjustments were also made by bias correcting the ERA-Interim precipitation es-timates based on the Global Precipitation Climatology Project (GPCP) v2.1 data.The ERA-Interim/Land reanalysis dataset has been compared and verified againstnumerous ground-based and remotely sensed observations, including those for soilmoisture, snow depth, surface albedo, turbulent latent and sensible fluxes, and riverdischarges. These comparisons have found that this dataset represents an overallimprovement in land surface representation. MERRA and MERRA-LandThe Modern-Era Retrospective analysis for Research and Applications [Rieneckeret al., 2011] reanalysis is a gridded product created by NASA. This reanalysis usesa fixed assimilation system, the Goddard Earth Observing System Data Assimila-tion System Version 5 (GEOS-5), to incorporate NASA’s EOS observations intoa coherent representation of mass and energy flows from the earth’s surface tothe upper atmosphere. Native horizontal spatial resolution is 1/2° latitude × 2/3°longitude. On land 2D diagnostics such as surface fluxes, vertical integrals andland states are available at 1-hour intervals; 3-hourly atmospheric diagnostics on42 pressure levels are available at coarser (1.25°) resolution, while analyses on 72levels extending up through the stratosphere are available every 6 hours at nativeresolution.MERRA-Land [Reichle et al., 2011] is an offline rerun of the MERRA landmodel with two notable differences. Precipitation forcing was based on mergingthe global gauge-based NOAA Climate Prediction Center “Unified” (CPCU) pre-cipitation product [Chen et al., 2008] with MERRA precipitation. CPCU V1.0 wasused for the period 1979-2005, and real-time CPCU was used for 2006 to present.Secondly, the land surface model was updated to Fortuna-2.5, used in GEOS-5.7.2operationally since August 2011. As a result of these changes, documented im-provements over MERRA including increases in runoff and snow depth skill sug-gest MERRA-Land is more accurate and is thus recommended for land surface23studies [Reichle et al., 2011]. Inconsistencies in the precipitation forcings for thisproduct, however, have called into question its ability to accurately represent SWE[Reichle, 2012]. GlobSnowGlobSnow [Pulliainen, 2006, Luojus et al., 2010, Takala et al., 2011] is a productof the Finnish Meteorological Institute (FMI). The algorithm relies on a forwardmodel of observed brightness temperature (Tb) as a function of snow pack proper-ties. Effective snow grain size is estimated locally by fitting the results of a snowemission model to the measured brightness temperature channel difference mostcommonly used to derive SWE (19 GHz − 37 GHz). A background field of snowdepths is then created by applying kriging interpolation to synoptic snow depthobservations, and the associated background SWE is estimated using snow densityreference information. Finally retrieved SWE is found by weighting these val-ues by their estimated variances. The SWE product is mapped to the Equal-AreaScalable Earth Grid (EASE-Grid) at a resolution of 25 km. Thematic accuracy istargeted to be 30-40 mm SWE for accumulations of less than 150 mm SWE. Thedata available by public ftp includes all non-mountainous areas in the NorthernHemisphere. A prototype “mountains unmasked” dataset including all of BC wasobtained from FMI on May 26, 2013. CMCThe Canadian Meteorological Centre (CMC) Daily Snow Depth Analysis Data[Brown and Brasnett, 2015] is a Northern Hemisphere subset of the CMC globaldaily snow depth. The dataset is based on snow depth observations from surfacesynoptic observations (synops), meteorological aviation reports (metars) and spe-cial aviation reports (SAs). A simple snow accumulation and melt model driven byoutputs from the CMC Global Environmental Multiscale (GEM) forecast modelis first used to create a first-guess snow field, and then the gridded snow depthoutput is generated from observations using optimal interpolation. Monthly SWEestimates are created using a table of mean monthly snow density values for var-ious snow-climate land classes over Canada: tundra, taiga, maritime, ephemeral,24prairie and alpine [Sturm et al., 1995] and projected to a standard 24 km polarstereographic grid [Brown and Mote, 2009].2.3.2 Assignment of Grid Cells and Construction of Time SeriesEach manual snow survey site was matched with its containing grid cell for eachgridded product. Averaging of snow survey measurements for stations within asingle grid cell was explored, but differences in available dates meant that morethan 40% of the observations would be discarded, an unacceptably high loss giventhe limited data available. Each station was therefore considered separately inthe analysis. Yearly time series consisting of observations for all available yearsfor each station and survey were constructed. Up to 6 yearly time series, onefor each survey date, potentially represent each station, depending on observationavailability. The median number of years considered for each product and surveyis shown in Table 2.3.Table 2.3: Gridded SWE Product Median Number of Years Analyzed by SurveyMonthProduct Jan Feb Mar Apr May JunGlobSnow 21.5 28 29 30 30 13CMC 13 13 14 14 14 13GLDAS1 11 12 13 13 13 12GLDAS2 21 28 30 30 30 18ERAInterim 22 30 32 32 31.5 18ERALand 21 28 30 30 30 18MERRA 22 30 32 32 31.5 18MERRALand 22 29 31 31 31 17.5For each gridded product, a single SWE value on or closest to the date of eachobservation was used in the time series. The following filters were applied. Fordaily gridded products, a maximum of 7 days was allowed between the observationdate and that of the product. The reason for the use of a ± 7 day window was thatsome gridded products are not available every day, so it was necessary to find theclosest date when an exact match was not found. SWE values represented on datesthat differed from those of the observations by more than 7 days were not used.For monthly products, the window was± 2 weeks, and the mid-May and mid-June25surveys were not evaluated. Time series that were less than 10 years long were alsodiscarded. Stations were then grouped by physiographic region.2.3.3 Performance ScoresUsing the constructed parallel time series between the evaluated gridded productsand manual snow surveys, various performance metrics were calculated in orderto describe their relationship. The metrics were then used to rank the productsaccording to their relative values. Knowing how each evaluated product ranks incomparison with the others gives insight into which products best capture interan-nual variability, estimate mean snow state, and minimize overall error. This knowl-edge in turn will help in the selection of products for use in estimating snow inBC and in future research on improving snow estimation methods. The followingmetrics were calculated and used in this ranking. Pearson CorrelationThe Pearson correlation is defined as the covariance of two time series, x and y,divided by the product of the standard deviations of each respective series, as fol-lows:rxy =cov(x,y)sxsy, (2.1)wherecov(x,y) =1N−1N∑i=1(xi− x)(yi− y), (2.2)sx =√1N−1N∑i=1(xi− x)2, (2.3)and likewise for sy. N is the number of observations in each interannual time seriesof survey months (January, February, etc.) for each station, and overbar denotesthe mean. Correlation is a measure of the strength of the linear relation betweeneach gridded product time series and its corresponding MSS time series. Since thecompared sequences are yearly time series, Pearson correlation values represent theability of a gridded product to capture interannual variability in SWE as reflectedin the manual snow series. Bias and MAEBias, the difference in the means of the time series of gridded products (x) andmanual snow surveys (y), gives the average offset of gridded product values:bias = x− y. (2.4)Mean absolute error (MAE), the average of the absolute differences in products (x)vs. measurements (y), represents a measurement of the error associated with eachgridded product time series:MAE =1NN∑i=1|xi− yi|. (2.5)Hence bias gives a sense of the mean difference, whereas MAE shows the spreadin those differences. Because the amount of snow measured at MSS stations variesover a large range of values, the value of the bias and MAE also vary considerably. Normalized Bias and MAEA bias or MAE value, seemingly large at a station receiving typically little snow,may appear small at a station receiving much more snow. To reflect this incon-gruity, the bias and MAE values were normalized before the values were consoli-dated and comparisons made. One appropriate value for this normalization is the30-year mean peak SWE measured at each station. While the use of the mean peakSWE tends to minimize the error metrics, selection of a constant station value willyield errors that reflect the seasonal cycle (e.g. errors are highest near maximumSWE and lower on the season shoulders). This normalization also ensures thatgridded products with a bias or MAE less than the mean peak SWE will have anormalized value between −1 and 1, facilitating comparisons. The value chosenfor normalization was the mean observed April SWE at each station for all yearsduring 1980-2009 that were available.272.4 ResultsFive metrics were calculated for each station and survey time series for which 10years of both manual snow surveys and gridded product values are available. Avisual overview of product performance is shown in Figure 2.4. The normalizedMAE for the April survey (time of maximum average accumulation) at each sta-tion for each product is depicted using a color scale from white to red for valuesof 0 to 1 with dark red representing values over 1. Several general observationscan be made. While the highest station density is seen in the southern part of theprovince, so too are some of the highest MAEs, as many red and dark red stationsare seen for most products. GlobSnow and ERAInterim appear to have mostlyhigh normalized MAEs in the south, while GLDAS2, ERALand and MERRA dis-play more white and pink stations. Further north, many lower normalized biasesare seen for several products, including CMC, GLDAS1, ERALand, MERRA andMERRALand. In order to conduct a quantitative examination the metrics were ag-gregated, first across BC to study the seasonal dependence, then by physiographicregion to evaluate the products on a smaller scale.2.4.1 Seasonal Performance Across BCPerformance metrics between the evaluated gridded products and manual snowsurveys have been calculated by survey month. Plots of the following performancemetrics are shown in Figure Pearson CorrelationDecreasing median correlations indicate a decreasing ability for interannual dis-crimination through the snow season, October to June (hereinafter referred to assimply the season). ERALand has the highest median correlation values, beginningthe year at 0.8 and decreasing to 0.5 at end of season (Figure 2.5a). MERRA alsobegins strongly, with correlations close to those of ERALand in January throughApril (median correlation 0.6 to 0.7), but it then decreases dramatically in Mayand June. CMC and GLDAS2 display a drop off in correlation toward season endsimilar to that of ERALand, but these products have lower overall values (median0.6 decreasing to 0.4 at end of season). Other products display relatively weak28505254565860(a) (b)505254565860(c) (d)505254565860(e)  (f) (deg W)Latitude (deg N)Figure 2.4: Median station normalized MAEs for the April survey for the 8 evaluatedproducts: (a) GlobSnow (b) CMC (c) GLDAS1 (d) GLDAS2 (e) ERAInterim(f) ERALand (g) MERRA (h) MERRALand. Normalized MAEs from 0 to 1are represented on a color scale from white to red; values over 1 are representedby dark red.2900.  GlobSnowCMCGLDAS1GLDAS2ERAInterimERALandMERRAMERRALand−400−350−300−250−200−150−100−500(b)Bias (mm SWE)−1−0.8−0.6−0.4−0.20(c)BiasnJan Feb Mar Apr May Jun050100150200250300350400(d)MAE (mm SWE)Jan Feb Mar Apr May Jun00. 2.5: Performance of gridded products relative to manual snow surveys forprimary survey months as measured by the median computed over the whole ofBC. a) Pearson correlation. b) Bias. c) Normalized bias (bias divided by meanstation April SWE). d) Mean absolute error (MAE). e) Normalized MAE (MAEdivided by mean station April SWE).30correlations (median 0.1 to 0.4) with significant seasonal drop offs that suggestsalmost no interannual discrimination at end of season. Bias and MAESignificant negative median bias values are indicative of underestimation of SWEin all gridded products. Increasing median magnitude of bias (Figure 2.5b) andmedian MAE (Figure 2.5d) indicate challenges for gridded products to accuratelyrepresent the snowpack as the season progresses. ERALand has the least negativebias and lowest median MAE of all products, followed by GLDAS2 and MERRA.A trend of increasingly negative median bias values is consistent throughout theentire season for ERALand and CMC. Other products display a change in this trendand in several cases a reversal to a less negative bias at season end. Median MAEincreases throughout the season for CMC and GLDAS1, while other products showa decrease in MAE in June, likely due to the dwindling SWE at that time of theyear. Normalized Bias and MAEThe magnitude of median normalized bias (Figure 2.5c) and the median normal-ized MAE (Figure 2.5e) increase throughout the season to maxima in April or May.The increase in the magnitude of the median normalized bias during the earlierpart of the year is relatively smaller in ERALand and CMC than in the other prod-ucts. The magnitudes of median normalized bias and MAE drop off sharply forthe June survey. These median normalized biases and MAEs approximately followthe typical SWE accumulation curves throughout the season, suggesting bias andMAE are related to SWE accumulation. However, while MAEs for GlobSnow andERAInterim peak in April, the remaining products level off or even peak in May.The increasing errors despite decreasing SWE suggest most of the products beginmelting the snow too quickly at the onset of ablation. These errors are most appar-ent in SWE time series in the higher accumulation regions (Figure 2.3d-e), whereproducts that are underestimating SWE show reductions at end of season, even asmeasured SWE holds constant or continues to increase.31Figure 2.6: Heat maps showing median statistical values by gridded product, surveymonth (January through June), and region: BC Plains (A), Interior Plateau (B),Northern and Central Plateaus and Mountains (C), Columbia Mountains andSouthern Rockies (D), and Coast Mountains and Islands (E). SWE accumula-tion generally increases from region A to region E. Values to the left of each rowrepresent medians across all survey months. Boxes are blacked out when insuf-ficient data was present. Statistics shown are: a) Pearson correlation. b) Biasin mm SWE. c) Normalized bias (bias divided by mean station April SWE). d)Mean absolute error (MAE) in mm SWE. d) Normalized MAE (MAE dividedby mean station April SWE).2.4.2 Physiographic Region CharacterizationsMedian scores for each station and survey month in each region are plotted as aheat map in Figure 2.6. Characterizations of the regions and the performance ofthe products in each region are presented in this section.322.4.2.1 BC PlainsWith relatively flat topography and the lowest SWE values, the BC Plains exhibitsthe most favorable statistics for the majority of surveys and gridded products. Formost evaluated products, the highest median correlations were found in the BCPlains, ranging from a low of 0.6 for MERRA to a high of 0.8 for CMC, with amedian of 0.7. The region also exhibited the lowest magnitude median bias andMAE values, ranging from −2 to −33 mm SWE and 21 to 37 mm SWE respec-tively. Though the range of SWE values is not as large as in other regions, nor-malizing the bias and MAE by the mean April SWE allows comparison of stationswith differences in snow accumulation. Notably, GlobSnow and CMC performexceptionally well relative to other products in this region, exhibiting the lowestmagnitude medians for normalized bias (−0.03) and normalized MAE (0.3). Thisis the only region in which these two products perform well. ERALand, MERRAand GLDAS2 have low magnitude of median normalized bias and MAE relative toother reanalyses and LDAS products. Interior PlateauDespite the relatively low snow accumulations in the Interior Plateau, correlationvalues are the lowest or second lowest of all regions for most gridded products.The exceptions are GlobSnow and ERAInterim, where median correlations are in-termediate relative to their corresponding values in other regions. This findingsuggests that while product performance generally improves with decreasing snowaccumulation, the evaluated products are not capturing interannual variability aswell as might be expected in this particular region. Median normalized bias andMAE values here are very similar to those of the two regions of highest accumula-tion, except that ERALand values are slightly better in this region (−0.2 and 0.4 fornormalized bias and MAE respectively) than in the highest accumulation regions(−0.4 and 0.5 respectively). Northern and Central Plateaus and MountainsThe Northern and Central Plateaus and Mountains has intermediate snow accu-mulations. However it is the second best region in terms of normalized bias and33MAE, and for ERALand and MERRA has values very close to those of the BCPlains (−0.2 and 0.3 respectively). In this region there is an increasingly strongnegative bias and higher MAE at end of season. These findings suggest that manyof the evaluated products are melting off snow too early in the Northern and Cen-tral Plateaus and Mountains. This conclusion is consisent with the SWE time seriesshown in Figure 2.3c, though early melting is more apparent in the higher accumu-lation regions. Columbia Mountains and Southern RockiesIn the Columbia Mountains and Southern Rockies there is a wide range of corre-lation values, with product medians ranging from 0.3 to 0.8. Despite the relativelyhigh accumulations, the median correlations in this region are the second highest ofall regions for three of the evaluated products: GLDAS2, ERALand and MERRA.GLDAS2 has the lowest magnitude median normalized bias at −0.3, followed byERALand and MERRA at −0.4 and the remaining products at −0.5 to −0.7. Me-dian normalized MAE is similarly lowest for these same three products. For othergridded products, the magnitudes of median normalized bias and MAE generallyfollow accumulations relative to other regions (i.e. regions with lower accumula-tions tend to have lower magnitudes of median normalized bias and MAE). Coast Mountains and IslandsThe Coast Mountains and Islands is the region of highest snow accumulation andthe most challenging for the evaluated products to represent. MERRA and ERA-Land show median correlations of 0.6 and 0.7 respectively, while that of GlobSnowis less than 0.1. ERALand has a median normalized bias of −0.3, as compared toother products which range from −0.5 to −0.8. The three best products in termsof normalized MAE are again ERALand, GLDAS2 and MERRA, but the spreadof products other than ERAInterim and GlobSnow is very small.2.4.3 Elevation DependenceOne of the potential pitfalls of the described methods is comparing a point SWEmeasurement at one elevation with gridded areal value that has a significantly dif-34ferent mean elevation. Hence it is useful to characterize the relationship betweenSWE error and elevation error. Pavelsky et al. [2011] examined elevation depen-dence by plotting SWE accumulation error vs. elevation error and found correla-tion coefficients in order to characterize the strength of the relationship. In additionto elevation difference, considering the absolute elevation may also reveal insightinto the relationship. The April survey bias for all stations and each product isplotted against elevation and grid-station elevation difference in Figure 2.7. Allproducts show a strong negative bias for negative grid-station elevation differences(grid mean below station) and stations near 1000 masl. GlobSnow and ERAInterimare negatively biased throughout the entire elevation-difference space, while CMCand GLDAS1 are negatively biased except for the most positive elevation differ-ences. The remaining products are more negatively biased but have broad areas ofpositive bias for positive elevation differences. Percent of variance attributable toelevation and elevation difference, obtained by linearly regressing SWE bias on theelevation and elevation difference covariates, ranges from 17% (GlobSnow) to 41%(MERRA). Products with a higher percent of variance on elevation and elevationdifference have more of their apparent error associated with these parameters.2.4.4 Ranking of Gridded ProductsAcross the province, 256 manual snow survey stations with 10 years or more ofrecords across five physiographic regions reported up to 8 surveys per station, 6 ofwhich were considered here (i.e. mid-month surveys were excluded). An effectiveway to summarize the results is to rank the products by metric. The method appliedby Decker et al. [2012] was applied to determine this ranking. For each stationand survey, the products were ranked based on the scores for evaluated metrics.Correlation, bias and MAE were separately ranked, with rank one being the best.Normalized bias and MAE were also ranked, but because the normalization didnot change the relative ranking at each station and survey, the results (not shown)were identical to those of bias and MAE. Only stations and surveys for whichall products yielded a given metric were used in the ranking. The rank numberswere then averaged across all stations and surveys to find a score for each product.An overall ranking for the products was then determined by averaging these rank35500100015002000 −1000  −500  −500  −500  −500  −500  −500  −500 (a) −1500  −1000  −500  −500  −500  0  0  0  0 (b)500100015002000 −1000  −500  −500  −500  −500  0  0  0 (c) −1000  −500  −500  −500  −500  −500  0  0 (d)500100015002000 −1000  −500  −500  −500  0  0 (e) −1000  −500  −500  −500  −500  −500  0  0  0  0  0  0  0  500 (f)500100015002000−1000 −500 0 500 1000 −1000  −500  −500  0  0  0  0  0  0  500 (g)−1000 −500 0 500 1000 −1000  −500  −500  −500  −500  0  0  0 (h)Grid−Station Elevation Difference (m)Station Elevation (m)−1500−1000−500050010001500Figure 2.7: Bias contour plots, and percent of variance attributable to elevation andelevation difference for (a) GlobSnow, 17% (b) CMC, 22% (c) GLDAS1, 23%(d) GLDAS2, 33% (e) ERAInterim, 22% (f) ERALand, 38% (g) MERRA, 41%(h) MERRALand, 35%. Biases for the April survey are shown as contours on aplane of mean grid cell elevation difference vs. station elevation. Positive biasesand elevation differences indicate mean grid cell values greater than those of therespective stations.36scores, as summarized in Table 2.4.Table 2.4: Rank Score of Gridded Products*Product correlation bias MAE mean scoreERALand 2.19 3.04 2.94 2.72GLDAS2 3.74 2.86 2.82 3.14MERRA 3.09 4.28 4.11 3.83CMC 3.87 4.01 4.13 4.00GLDAS1 5.11 4.58 4.59 4.76MERRALand 5.44 4.96 5.09 5.16GlobSnow 6.12 5.73 5.77 5.87ERAInterim 6.45 6.54 6.53 6.50*Lower rank score indicates better performanceERALand ranked first, having the lowest rank score in correlation and sec-ond lowest in bias and MAE. GLDAS2, ranking second, had the smallest biasand MAE but trailed both ERALand and MERRA in correlation. Third in theranking, MERRA was second in correlation but near the middle of the ranking inbias and MAE. The remaining products, namely CMC, GLDAS1, MERRALand,GlobSnow and ERAInterim, were fourth through eighth across all three metrics,respectively.2.4.5 Confidence IntervalsDifferences in gridded products were evaluated for statistical significance. For eachcalculated performance score, differences were found between each product com-pared to all others in overlapping stations and surveys. 95% confidence intervalson these differences were determined by calculating the verification statistic, inthis case the median, on 5000 bootstrap samples of differences as described in Jol-liffe [2007]. If the middle 95% of these medians (2.5 percentile to 97.5 percentile)are entirely positive or negative, there is a statistically significant difference in theproducts. To be considered better performing, correlations for the evaluated prod-uct must have a statistically significant difference above that of the compared prod-uct, whereas the magnitude of the bias and MAE must be below that of the com-pared product. If the middle 95% of the bootstrapped median differences contains37both positive and negative members, there is no statistically significant differencein the products.Table 2.5 lists adjacently performing product pairs in columns A and B. Foreach comparison, the better performing product in correlation, MAE, normalizedMAE (MAEn), bias and normalized bias (biasn) is indicated by the letter of thatproduct’s column (“A” or “B”); a statistical tie is indicated by an equal sign (“=”).The products are listed in order of overall performance ranking. Each evaluatedproduct in column A outperforms the compared products below it in the perfor-mance ranking for most calculated scores. Exceptions to this include an edge incorrelations that ERALand displays over GLDAS2, while there is otherwise no sta-tistically significant difference between these products. GLDAS2 outperforms allfollowing products in all metrics except correlation, for which GLDAS2 is outper-formed by MERRA and statistically tied with CMC. Few significant differencesare found in the comparison of MERRA and CMC as well as that of GLDAS1 andMERRALand. ERAInterim is statistically outperformed by all other products.Table 2.5: Statistically Significant Performance Comparisons*Product A Product B corr MAE MAEn bias biasnERALand GLDAS2 A = = = =GLDAS2 MERRA B A A A AMERRA CMC A = = = =CMC GLDAS1 A A A A AGLDAS1 MERRALand = A A = =MERRALand GlobSnow A A A A AGlobSnow ERAInterim A A A A A*Letter A or B indicates which of the adjacently performing products on the givenrow is statistically better. An equal sign (“=”) indicates no statistically significantdifference.2.5 Discussion and ConclusionAn assessment of eight SWE gridded products has been conducted. In situ mea-surements taken during manual snow surveys have been compared with values re-ported for the encompassing grid cells of these gridded products across BritishColumbia by survey and physiographic region. Throughout the province, interan-38nual correlations between manual snow surveys and all evaluated gridded productsshow a decrease in median value through the snow season, suggesting a lower abil-ity to discern interannual trends for later surveys. ERALand has the highest mediancorrelations, followed by MERRA, CMC and GLDAS2, though MERRA performsparticularly poorly at season end. The remaining evaluated products exhibit weakpositive correlations that further degrade as the season progresses. The evaluatedgridded products on average underestimate SWE across the region as evidencedby negative bias values determined at a majority of manual snow survey stations.The lowest magnitude bias and MAE values are displayed by, in order, ERALand,GLDAS2 and MERRA. For most products MAEs increase through the season toa peak in April or May before falling off, suggesting increasing accumulationsof snow are increasingly hard to capture. ERALand and CMC display this trendthrough the season end, while other products show reductions in MAE and biasmagnitude in the final surveys of the season.Physiographic region breakdowns generally yield better performance statisticsin regions of lower SWE accumulation, though there are exceptions to this general-ization. The BC Plains in the northeast of the province receives the lowest overallsnow totals. Median correlations in this region range from 0.6 to 0.8, while medianbias and MAE values are−2 to−33 mm SWE and 21 to 37 mm SWE respectively.This is the only region in which CMC and GlobSnow are strong performers, andindeed CMC is the strongest of those evaluated in this region. The relatively poorerperformance of GlobSnow in other physiographic regions is likely due to the uncer-tainty of microwave measurements particularly in complex, snow covered Alpineterrain, which is well documented in the literature [e.g. Tong et al., 2010], suggest-ing this performance result is specific to BC and other mountainous regions. Thesecond lowest region of accumulation, the Interior Plateau exhibits relatively weakgridded product performance statistics. Correlations in the region are the lowest orsecond lowest for most gridded products, and normalized bias and MAE values arelargely in line with those of high accumulation regions. The Northern and CentralPlateaus and Mountains, a region of intermediate snow accumulation, has mediannormalized bias and MAE values that are second only to the BC Plains. End ofseason MAE and bias magnitudes rise quickly however, suggesting an overestima-tion of the melt off rate. The region of second largest SWE accumulation is the39Columbia Mountains and Southern Rockies. The region exhibits relatively highcorrelation values for three of the best performing products: GLDAS2, ERALandand MERRA. Magnitudes of median normalized bias and MAE are only slightlyhigher than in the BC Plains and Northern and Central Plateaus and Mountains.The Coast Mountains and Islands region has the highest mean SWE values and isconsequently the most difficult to estimate. Median correlations range from 0.1 to0.7 and normalized bias values are −0.3 to −0.8. MERRA and ERALand have thebest performance statistics in this challenging region.Comparing these results with previous studies at different sites and scales, sim-ilar or slightly better performance metrics are observed, particularly for the betterperforming products. Carrera et al. [2010] compared SWE and precipitation val-ues for the 2005-2007 snow seasons at 13 survey sites in the Southern CanadianRockies found using two different precipitation schemes, the Canadian Precipi-tation Analysis (CaPA) and the Global Environmental Multiscale (GEM) model.For all SWE measured-simulated pairs at the native 15 km resolution, correlationswere found to be 0.40 and 0.50 for the two precipitation schemes respectively. Thecorrelations found in this current study generally exceed those values, suggestingbetter performance. Previously reported biases were −165 mm SWE for CaPAand −77 mm SWE for GEM. The magnitudes of bias found in this current studyfor all products and stations across the province in a 30 year period are generallyhigher than those previously reported, suggesting poorer performance. However,the best performing products found here have a median bias not far from thosepreviously reported (e.g. GLDAS2 with median bias of −181 mm SWE in theColumbia Mountains and Southern Rockies). Mudryk et al. [2015] found relativelygood correlations across the Northern Hemisphere among modern reanalyses suchas ERALand and MERRA but not as high for GLDAS2. Other studies have foundsuperior performance of snow models driven by ERAInterim forcings with betterSWE representation realized by improvements to the snow processes in the modelsthemselves [Brun et al., 2013, Dutra et al., 2012].The calculated metrics determine a performance ranking as follows: ERALand,GLDAS2, MERRA, CMC, GLDAS1, MERRALand, GlobSnow and ERAInterim.A 95% confidence interval was determined for the difference in each statistic be-tween each pair of products. The differences were found to be statistically signifi-40cant except as indicated in Table 2.5. This ranking informs applications requiringselection and prioritization of one or more of these SWE gridded products overBC or regions of similar climate and topography. One such application will seekto combine gridded products to create improved estimates of SWE over BC us-ing a data fusion approach based on machine learning. Various combinations ofproducts will be tested to determine the best performing model both in terms ofaccuracy and speed. By focusing on combinations of the best performing products,the development time of such a model is expected to be reduced.41Chapter 3Data Fusion of Gridded Products3.1 IntroductionIn areas of the world with significant alpine regions, many drainage basins exhibita predominantly nival regime. Under such conditions, which are characteristic ofthe province of British Columbia (BC), Canada, accurate snow water equivalent(SWE) estimation is particularly crucial for water supply, hydropower generationand flood forecasting and planning purposes. The ability of current methods andproducts to give accurate SWE estimates is limited by a number of topographicand climate factors. Forest cover, topography and large SWE accumulations posechallenges for physical models containing multiple simplifications and parameter-izations, as well as for satellite-based products, which can experience signal mask-ing and wash-out [Chang et al., 1996, Derksen, 2008]. Likewise reanalysis andassimilation-based products are susceptible to biases arising from various struc-tural limitations (e.g. elevation biases tied to spatial resolution) and uncertaintiesin the climate mean state [Dutra et al., 2011, Reichler and Kim, 2008]. Mudryket al. [2015] compared various gridded products across the Northern Hemisphereand observed large spreads in SWE with magnitudes on the order of the SWE in-terannual variability. These spreads were seen particularly in alpine regions withcomplex topography and in the Arctic, a region of large data gaps.A similar set of gridded products was examined and ranked in Chapter 2 overthe rugged, forested topography of British Columbia. Product correlations, biases42and MAEs were determined by comparison with in situ manual snow surveys andreported by survey month and physiographic region. While the best products weregenerally superior performers across the determined statistics and the majority ofregions, all products were found to underestimate SWE and imperfectly representinterannual fluctuations, particularly in the regions of highest snow accumulation.Against this backdrop, the current work set out to improve the accuracy of SWEestimates in the same topographically complex region by combining an ensembleof products with relevant site covariates utilizing a data fusion approach.Numerous models and estimation techniques have attempted to better charac-terize SWE. Nascent satellite-based efforts involved the use of multiple linear re-gression (MLR) to retrieve snow parameters from passive microwave data. Thoughnewer platforms have become available, many of the early algorithms [Chang et al.,1987, Foster et al., 1997, Tait, 1998] rely on the Defense Meteorological SatelliteProgram (DMSP) Special Sensor Microwave/Imager (SSM/I) brightness temper-ature data, which is a long passive microwave record covering July 1987 to De-cember 2015 derived from intercalibrated measurements across several spaceborneplatforms [Wentz, 2013]. While the early linear approaches performed well in ar-eas of low snow accumulation, forest cover and topographic relief, mountainousregions with dense vegetation and deep snowpacks were not as well represented.Machine learning methods began to become an important tool in the envi-ronmental sciences in the 1990s and have since spread to many application areas[Hsieh, 2009]. Early applications of machine learning methods for retrieving SWEwere developed by Chen et al. [2001] and Guo et al. [2003] using artificial neuralnetworks (ANNs). Inputs were based on a snow hydrology model, and outputsderived from a dense media radiative transfer model were iteratively computed un-til the ANN results converged on observations. The model improved on previousefforts by improving grain size representation and applying a spatially distributedsnow accumulation and melt model. Tedesco et al. [2004] compared SWE retrievedfrom an ANN to values from the spectral polarization difference (SPD) algorithm[Aschbacher, 1989], the Helsinki University of Technology (HUT) snow emissionmodel [Pulliainen et al., 1999], and the Chang algorithm [Chang et al., 1987, Fosteret al., 1997]. The ANN used SSM/I 19 GHz and 37 GHz vertical and horizontalbrightness temperatures for inputs and the national operational snow observations43of the Finnish Environment Institute or HUT simulated brightness temperatures asthe target data. Under dry snow conditions enforced by only considering days onwhich the maximum temperature was lower than −5 °C, the ANN outperformedthese other methods when trained with observations. Evora et al. [2008] developeda data fusion modeling framework utilizing ANNs, passive microwave data andgeostatistics. Predictors for this framework were seven passive microwave chan-nels and the interpolated minimum temperature. The target SWE field was derivedfrom snow depth and density measured manually at snow stations and interpolatedby a kriging with external drift (KED) algorithm using elevation as the secondaryvariable. More recently Tong et al. [2010] compared SWE predictions of ANNsusing microwave brightness temperatures (TBs) to those of the SPD algorithm As-chbacher [1989] and other TB difference algorithms [Chang et al., 1987, Derksen,2008] in the Quesnel River Basin of BC, finding that ANNs which included themost TB channels outperformed other networks and algorithms. Binaghi et al.[2013] applied radial basis function networks to estimate snow cover thicknessin the Italian Central Alps, finding that this approach outperformed inverse dis-tance weight and spline interpolation methods commonly used in similar contextswith limited numbers of homogeneously distributed measurement sites. These andother studies covered in various review papers [Gan, 1996, Evora and Coulibaly,2009, Shi et al., 2016] demonstrate the promise of more accurate snow estimatesvia machine learning methods, but they do not incorporate existing SWE productsdirectly.Manual snow surveys, taken on courses extending tens of meters, cannot beaccurately represented by large-scale gridded products with resolutions on the or-der of tens of kilometers [Mudryk et al., 2015]. Inherent SWE differences can beintroduced by disparities between mean grid cell elevation and station elevation, aswell as other topographic controls. Even higher resolution covariates such as theGLOBE 1-km DEM [Hastings et al., 1999] still fail to adequately reflect importantphysical contexts at survey sites (e.g. local slope and aspect) which can signifi-cantly affect measured values. In spite of the scale mismatch, the combination oflarge-scale gridded products with covariates such as finer terrain information andphysiographic context may capture the spatiotemporal patterns in the manual snowsurveys. Scale issues are further addressed in Section objective of this work is to apply data fusion techniques to readily avail-able gridded SWE products, manual snow surveys and covariates in order to betterestimate SWE in BC. Means of six products are evaluated, as are means of the bestthree of these products based on rankings from Chapter 2. Relevant covariates arefurther combined with these sets of products in MLR models as well as nonlinearANNs. The assumptions and limitations underlying this approach are discussed indetail in Section Study AreaThe study area, namely the province of BC, can be separated into five physio-graphic regions, which are shown in Figure 3.1. These regions vary considerablyin terms of their topography and climate, as well as the amount of snow they re-ceive, as reported in Chapter 2. Along the entire western edge of BC, the CoastMountains and Islands is a highly rugged series of mountain ranges and troughswith a mean station SWE of 782 mm in April. The Columbia Mountains andSouthern Rockies to the southeast is a very rugged region with high mean eleva-tion and a mean measured April SWE of 565 mm. Possessing a similarly moderate337 mm SWE measured during April surveys, the Northern and Central Plateausand Mountains is relatively flat along the plateaus in the north with low mountainranges scattered throughout. The Interior Plateau in the middle of the province ismostly flat except in the east and has a more moderate April mean station SWE of283 mm. To the northeast of the province, the Interior Plains of British Columbia,referred to in this work as the “BC Plains”, is the region of lowest relief as well asSWE accumulations, having a survey station average of 88 mm in April. Furtherdescriptions of the study area are detailed by British Columbia Ministry of Forests[1995] and Holland [1964].3.3 MethodsVarious publicly available SWE data sets have global to hemispheric coverageand can provide information on snow spatiotemporal distribution over BC. Thisstudy considered four reanalysis products (ERAInterim, ERALand, MERRA andMERRALand), a land data assimilation system (GLDAS2), and an observation-45Figure 3.1: Physiographic Regions of British Columbia, Canada, overlaid on theGLOBE 1-km DEM. Elevations are shown according to the legend with units ofmeters above sea level. The five physiographic regions are outlined in red: (A)Coast Mountains and Islands, (B) Columbia Mountains and Southern Rockies,(C) Northern and Central Plateaus and Mountains, (D) Interior Plateau, and (E)BC Plains. SWE accumulations generally increase from region A to region E.Outlined in blue is the VIC domain run by the Pacific Climate Impacts Consor-tium, covering four basins throughout BC: (I) the Peace, (II) upper Columbia,(III) Fraser and (IV) Campbell River watersheds. Manual Snow Survey stationlocations are indicated by green dots. Inset: A map of Canada (source: Wiki-media Commons) shows BC highlighted in red. The province is bordered to thesouth by the continental United States (USA) and to the west by the US state ofAlaska (AK).46based product (GlobSnow). Key predictors of snow distribution and evolution arethese gridded SWE products, and the target data are manual snow surveys con-ducted throughout the province.3.3.1 Input DataThe following gridded SWE products were used as predictors in this study andare described in further detail in Chapter 2. ERA-Interim [Dee et al., 2011], areanalysis product of the European Centre for Medium-Range Weather Forecasts(ECMWF), assimilates land surface, oceanographic, atmospheric and spacebornemeasurements from numerous sources using the Integrated Forecast System (IFS)at T255 spectral (∼ 80 km) horizontal resolution. ERA-Interim/Land [Balsamoet al., 2015] is an offline rerun of ERA-Interim using an improved land surfacemodel known as HTESSEL (Hydrology-Tiled ECMWF Scheme for Surface Ex-changes over Land) and precipitation adjustments based on the Global Precipita-tion Climatology Project (GPCP) v2.1. The Modern-Era Retrospective analysisfor Research and Applications, or MERRA [Rienecker et al., 2011] ingests a hostof land, oceanic and atmospheric data using the Goddard Earth Observing SystemData Assimilation System Version 5 (GEOS-5) at a resolution of one-half degreelatitude by two-thirds degree longitude. MERRA-Land [Reichle et al., 2011] isan offline land surface simulation based on MERRA which uses the land surfacemodel Fortuna-2.5 and makes precipitation adjustments based on the NOAA Cli-mate Prediction Center “Unified” (CPCU) product. The Global Land Data Assimi-lation System Version 2, known as GLDAS-2 [Rodell et al., 2004] and produced byNASA, runs forcings from the Princeton meteorological data set [Sheffield et al.,2006] on one of four land surface models, including the 0.25° Noah implemen-tation that is used in this study. GlobSnow [Takala et al., 2011], the “mountainsunmasked” data set provided by the Finnish Meteorological Institute (FMI), relieson a background field of snow depths created using synoptic observations and aforward model of satellite-based microwave radiometer measurements to constructa 25 km horizontal resolution SWE field. Other evaluated products in Chapter 2,GLDAS-1 and CMC, do not cover the early part of the study period (1980-2010)and hence are not included in this work.47The target data are manual snow surveys (MSS) conducted by the Snow SurveyNetwork Program for the BC River Forecast Centre. At each designated surveysite a column of snow is extracted using a Standard Federal Snow Sampler andweighed to determine SWE. The mean of SWE measurements at five to ten pointsalong a snow course is calculated to find the station areal average, which is thenreported. Surveys are taken up to eight times a year, once at beginning of monthsJanuary through June and once mid-month in May and June, and they are currentlyconducted at 167 stations throughout the province with another 216 stations withhistorical but no current measurements. Sampling frequencies and temporal rangesvary considerably by station, with some records dating back to 1935.Automated snow pillow data were also examined as a part of this work. Thesedata were found to be considerably more prone to obvious errors than the manualsnow surveys. Such errors included negative values, snow accumulation and meltcurves that contained sudden jumps, drops and unrecognizable noise, missing val-ues that were sometimes interpolated and other problems. In addition to passingsuch errors into the model, the likelihood exists to introduce redundancy, as manyof the snow pillows are co-located with manual snow survey sites. Consequently,only the manual snow surveys were used as target data.3.3.2 Artificial Neural Network ConstructionAn ensemble ANN (hereafter ANN) was constructed following Cannon and McK-endry [2002] using the implementation by Cannon [2015]. ANN topology con-sisted of an input layer, one hidden layer and an output later with a single nodefor SWE. A single hidden layer is adequate to model any nonlinear continuousfunction [Hsieh, 2009]. The hyperbolic tangent sigmoid function was used as thehidden layer activation function, the identity function was used for the output layer,and predictor and predictand variables were standardized (zero mean and unit stan-dard deviation) prior to model training. Initial model weights were set randomlyin the range of ±√0.8/din, where din is the number of inputs [Thimm and Fiesler,1997]. The optimization function employed was the Broyden-Fletcher-Goldfarb-Shanno method or “BFGS method” [Fletcher, 1970, Nash, 1979]. Training datasets for the individual ANN ensemble members were created using the block boot-48strap [Davison and Hinkley, 1997], with each block made up of all training casesfor a given station. To prevent overfitting, early stopping was used to regularizeindividual ensemble members; ANN training was stopped when validation error,as monitored on the out-of-bootstrap cases, reached a minimum. The optimal num-ber of hidden nodes for each test split was determined by minimizing the ensem-ble’s out-of-bootstrap RMSE over the training data. Following model selection andtraining, predictions from the 50 ensemble members were averaged for each testsplit, and statistics for each test split station were calculated accordingly.In total, manual snow survey data from 386 stations across BC was available fortraining and testing. Of those stations 256 had at least ten years of measurementswithin the overlapping period of record of the six gridded products (1980–2010).This was considered to be an adequate length to generate representative statisticsfor testing the ANN by cross-validation. Ten test splits were created by selectingevery tenth station, stratified by watershed, such that for each split one-tenth of thestations were excluded from training. The result was an approximately uniformspread of test stations throughout the province in each split, as shown in Figure 3.2(green dots). Inevitably there were large areas with little in situ data, particularlyalong the coast and in the BC Plains to the north, but the use of all available dataoutside the test split maximized the spatial coverage of the training set. The MAEand April survey correlation reported at each station were computed using predic-tions from the ANN which did not include data from that station in its trainingset.Due to their coarse spatial resolution, the gridded products tend to capturebroad spatiotemporal SWE patterns over BC, but values still exhibit large biaseswith respect to MSS observations, as shown in Chapter 2. In addition to the grid-ded SWE products, the following base set of predictors were included in the model:elevation, elevation bias, latitude, longitude, physiographic region, year, and sineand cosine of 2pi times the elapsed year fraction. These predictors representedmajor spatial and temporal dimensions to the ANN training, as well as broad loca-tion context information (e.g. physiographic region). Predictor screening was notnecessary because relatively few base covariates were identified, and correlationsbetween predictors would not diminish model performance. Station elevation bi-ases were calculated as the grid cell mean elevations found from the GLOBE 1-km49lllllllllllllllllllll l llll+++++++++++++++ ++++++++++505254565860llllllll lllllllllllllllll l++++++++++++++++++++++++505254565860lllllllllll llllllll lllllll++++++++++++++ +++++++++ +505254565860llllllllllllllllllllllllll+++++++++++++ +++++++++++505254565860llllllllllllllllll lllllll+++++++++++++ ++++++++++505254565860135 130 125 120 115llllllllllllllllll ll lllll++++++++++++++ +++++++++llll llllllllllllllll llll+++++++++++ +++++++++ ++ + +lllllllllllllllllllllllllllllllllll llllllllllllllllllllll lllllllllllll lllll135 130 125 120 115Longitude (deg−W)Latitude (deg−N)Figure 3.2: MSS station locations used in evaluating combination products. Greendots show the distribution of ten test splits for comparing means, MLRs andANNs to gridded SWE products throughout BC by cross-validation. Bluecrosses show the distribution of seven test splits for comparing ANNs to VICruns in four BC watersheds. Physiographic regions are outlined in red.50DEM [Hastings et al., 1999] minus the reported station elevations, and elevationbiases corresponding to each included product were used as predictors. Physio-graphic regions were represented in the model using discrete logical variables foreach region, so-called 1-of-c coding as described in Bishop [1995]. These pre-dictors provide specific spatial and temporal information as well as broad locationcontext information.From a temporal perspective, the most important predictor of seasonal SWE isthe fraction of the year elapsed at the time of observation. This elapsed year frac-tion is modeled using trigonometric predictors (sine and cosine) as it is a periodicsignal. The stationarity of each utilized data set was not evaluated in the course ofthis study. In order to handle the prospect of non-stationary data, linear or other-wise, the value of the observation year was included as a predictor, allowing theANN to account for changes in the joint probability distribution over time.The assumptions underlying this approach include that the SWE field can bereliably resolved by an ANN topology that includes a single hidden layer, whichis suitable for modeling any nonlinear continuous function [Hsieh, 2009]. Overfit-ting is assumed to be adequately mitigated using early stopping and an ensembleof 50 members. It is also implicitly assumed that enough predictors are employedto adequately cover the solution space, though the number of predictors is less im-portant than the quality of those predictors. Outside the given study area and timewindow SWE estimates are less reliable, as properly trained ANNs perform non-linear interpolation well but extrapolate poorly, potentially leading to significantdeviations from the true signal beyond the bounds of the predictor training space[Hsieh, 2009].As previously mentioned, MSS values found at sites on the order of ten me-ters do not adequately represent SWE averaged across gridded product cells. Sinceelevation is a key indicator in both SWE accumulation and melt [Anderton et al.,2004], among the most apparent scale issues is the difference between survey siteelevation and grid cell mean elevation. Including such elevation differences aspredictors adds important site-specific context but also carries limitations in cap-turing interactions of topography and the atmosphere. Local elevation minima andmaxima may be subject to wind redistribution from peaks to valleys dependingon surrounding topography, and weather systems drop different amounts of snow51on windward and leeward mountain slopes. Nevertheless, elevation and grid cellelevation differences may provide a basic indication of subgrid SWE variability.Another important piece of local context is location within the province, key co-variates of which include latitude, longitude and physiographic region.Efforts to include additional local information were limited by lack of extensivesite-specific metadata and appropriate regional scale representations. For instance,while various ground cover products are available, the actual site forest cover ishighly dependent on snow course location, time since last maintenance and otherfactors, which vary considerably across the MSS data set (Tony Litke, personalcommunication, 2016). As a result, only this base set of covariates has been em-ployed. The main focus of this work, however, was to find which combination(s)of products lead to the best result. While other covariates may incrementally im-prove the ANN performance, use of this base covariate set should suffice for thispurpose.Table 3.1: Mean station MAEs (in mm SWE) for individual and combinations ofproducts and corresponding MLR and ANN runs, with the mean station MAEscomputed using only the VIC stations given in parenthesis. Products shown areERAInterim (E), GlobSnow (G), MERRALand (ML), MERRA (M), GLDAS2(G2) and ERALand (EL).Gridded ProductsE GS ML M G2 EL Mean Product MLR ANNX 411 (400) 214 (217) 184 (184)X 407 (396) 220 (218) 196 (194)X 362 (349) 211 (209) 178 (180)X 315 (303) 196 (191) 172 (168)X 308 (294) 206 (205) 177 (176)X 305 (290) 202 (201) 173 (176)X X 292 (282) 196 (194) 171 (173)X X 295 (287) 192 (186) 166 (163)X X 295 (287) 190 (186) 170 (165)X X X 289 (281) 188 (185) 163 (159)X X X X 302 (295) 189 (185) 165 (162)X X X X X 316 (311) 189 (186) 167 (167)X X X X X X 328 (324) 189 (186) 168 (166)The gridded SWE product combinations shown in Table 3.1 were used as pre-52dictors in the ANNs. Multiple linear regression (MLR) models were run using thesame sets of predictors and test station splits, and their results are listed along-side those of the ANN models for comparison of linear and nonlinear data fusiontechniques. The predictor combinations included each gridded product run indi-vidually as well as in combination with better performing products. In addition,each two-product combination of the three best products was run. The rationalefor this approach was the desire to limit the total number of ANN runs to substan-tially less than all 63 possible combinations of any number of products, runningthose most likely to give performance insights and improvements. The three bestperforming products had mean station MAEs across BC that were within 5% ofeach other, whereas other products had mean station MAEs 15–30% higher thanthe top product, ERALand. Furthermore, it was not anticipated, for instance, thatadding the poorest performing product to an ANN of the best three would outper-form an ANN of the best four. Only observations for which all six gridded productshad SWE values within seven days of the observation date were used, so that allmodels were based on the same set of observations.A number of alternative machine learning methods were investigated in thecourse of this work. Bayesian Neural Networks (BNN) and Support Vector Ma-chines (SVM) runs were executed but did not result in notable improvements overthe ANN in spite of additional computational cost. While studies [e.g. Xue andForman, 2015, Forman and Reichle, 2015] have shown machine learning meth-ods such as SVM to be superior over ANNs for snow-related parameters, Limaet al. [2015] found that in an evaluation of four different non-linear methods acrossnine different environmental data sets, no single non-linear method consistentlyoutperformed the others. Though an exhaustive exploration of machine learningalgorithms was not the objective of this manuscript, the machine learning runscompleted indicated that, of the investigated algorithms, the ANN was at leastcomparable if not superior for this application and data sets.3.3.3 Hydrologic Model ComparisonThe Variable Infiltration Capacity (VIC) hydrologic model is used by the PacificClimate Impacts Consortium (PCIC) to conduct hydrologic assessments for the53province of British Columbia [Pacific Climate Impacts Consortium, 2014]. VIC isa macroscale hydrologic model that solves full surface water and energy balances[Liang et al., 1994], sharing many basic features with land surface models (LSMs)in global climate models (GCMs). The land surface is modeled as a grid of large,flat, uniform cells, and sub-grid heterogeneity is handled via statistical distribu-tions. The land-atmosphere fluxes and water and energy balances are simulated ata daily or sub-daily time step. Water can only enter a grid cell via the atmosphere,which means that water in the channel network stays in the channels and subsur-face flow between grid cells is not included in the water balance. For the 1/16° gridcells used in the PCIC implementation, this is a reasonable approximation.The PCIC VIC implementation currently covers four watersheds throughoutBC [Pacific Climate Impacts Consortium, 2014], as shown in Figure 3.1. Relevantmodel parameters, specifically snow/rain temperature thresholds and snow albedodecay curves, were explicitly adjusted to reproduce observed SWE at snow pillows,with subsequent automated multi-objective calibration to reproduce streamflow se-ries and volume errors [Schnorbus et al., 2010]. The model is forced with dailyobservations for minimum and maximum temperature, precipitation, and wind ona 1/16° grid [Schnorbus et al., 2014].A second set of ANN runs was conducted using a subset of the test stationspreviously used. Only the MSS stations that were within the VIC domain and thathad at least ten years of measurements were used for testing. The 173 stationswhich met this criterion were divided into seven test splits of 24 or 25 stationseach, ordered by watershed, such that for each split one-seventh of the VIC longrecord stations were excluded from training. This selection results in a relativelyuniform spread of test stations for each split over the VIC domain as shown inFigure 3.2 (blue crosses). Model training and MAE and correlation computationfor each station were performed as in the full-province runs.3.4 ResultsResults of ANN runs for different combinations of products in this study are shownin Table 3.1. The gridded products used are listed in order of increasing perfor-mance across the entire province: ERAInterim (E), GlobSnow (GS), MERRA-54Land (ML), MERRA (M), GLDAS2 (G2) and ERALand (EL). Mean station MAEsare presented for single and average gridded product SWE values plus MLRs andANNs constructed using test splits of 256 stations across BC. Mean station biaseswere found to mirror mean station MAEs, as errors are largely the result of under-estimates of SWE in most regions of the province (see Figure S1 in Snauffer et al.[2017]). MAEs are also shown for those same combinations based on test splits of173 stations within the VIC domain. Using the individual products in an ANN witha base set of relevant spatiotemporal covariates improves the mean station MAE by43% to 55%. ANNs and MLRs based on combinations of the products further im-prove the mean station MAE, and MAEs for the ANNs are consistently lower thanthe corresponding values for the MLR models. While ANNs using multiple prod-ucts all outperform those using single products, the best performance was achievedby a combination of the three best products: MERRA, ERALand and GLDAS2.The differences in mean station MAE between the three-product ANN and ANNsthat use more products are very small, suggesting that adding the products that donot perform as well does not bring additional useful information into the model.These results are consistent both for ANNs that use test splits of all 256 stationsand for those that rely on test splits of the 173 VIC stations.While comparing snow statistics aggregated across BC is useful for broad com-parisons, SWE is a highly inhomogeneous field due to climatic and topographicvariability in the province. Figure 3.3 illustrates the variability at stations acrossthe five physiographic regions of the province. SWE values from manual snowsurveys, gridded products and combinations are plotted for representative stations,each of which has a station MAE close to the mean for the encompassing region.Regions are ordered by decreasing mean SWE accumulations in panels (a) through(e). The time interval shown covers snow seasons of 1999, a year of relatively highaccumulations across the province, to 2005, a relatively low year. A wide range ofaccumulations across the regions was observed, with a peak SWE of 50 to 150 mmobserved in the BC Plains (panel e) while 500 to over 3000 mm SWE was measuredin the Coast Mountains and Islands (panel a). Significant temporal disparities werealso seen. Peak SWE in the Coast Mountains and Islands in 1999 was over fivetimes that of 2005, while there is less than a 40% difference for these years in theBC Plains. Underestimates in peak SWE are apparent for most gridded products55and snow seasons. The largest underestimates were in the Coast Mountains andIslands and the Columbia Mountains and Southern Rockies (panels a and b respec-tively), the regions of heaviest snow. The MLR model using the combination of thethree best products (MLR3) mostly improved the underestimates of SWE apparentat these highest accumulation stations. The MLR still significantly underpredictedSWE in the two regions of highest accumulations but then also overpredicted in thethree other regions with lower accumulations, particularly in the BC Plains (panela). The ANN using the three best products (ANN3) partially corrected these er-rors, estimating lower SWE values at the stations averaging less snow and higherSWE values at those with more snow relative to MLR3. Very high accumulationseasons are still particularly challenging for both products and statistical models,especially 1999 in panels (a) and (b). This finding indicates that while the ANN isbetter able to estimate SWE in most contexts, the very highest SWE values remainchallenging to assess.Further analyzing mean station MAE and April correlation by physiographicregion may reveal spatial dependencies and in turn give insight into model perfor-mance. Mean station MAEs (Figure 3.4) and April correlations (Figure 3.5) areshown for the five major physiographic regions of BC in order of decreasing meanSWE accumulations (panels a through e) and for the province as a whole (panelf). In order to measure the ability of the gridded products to capture peak SWEinterannual variability, correlations were calculated using time series constructedfrom April surveys only, as April is the survey of highest mean SWE for all re-gions, as found in Chapter 2. Mean station biases were also calculated but foundto mirror mean station MAEs, likely due to underestimates of SWE across mostof the province (see Figure S1 in Snauffer et al. [2017]). Whiskers on the barsrepresent 95% confidence intervals determined by n = 5000 bootstrap samples asdescribed in Jolliffe [2007]. Blue bars show mean station values for the six SWEgridded products, with dark blue highlighting the three best performers. Across thefour physiographic regions of highest accumulations, ERALand, GLDAS2 ANDMERRA are consistently the best performers in both mean station MAE and Aprilcorrelation values, though in many cases the 95% confidence intervals overlap. Inthe BC Plains, the region of lowest snow accumulations, GlobSnow has the lowestMAE and a similar April correlation to that of MERRA and ERA-Interim, though560100020003000SWE (mm)SWE (mm)(a)05001000SWE (mm)SWE (mm)(b)0200600SWE (mm)SWE (mm)(c)MSSMERRAGLDAS2ERALandMLR3ANN30100200300400SWE (mm)SWE (mm)(d)1999 2000 2001 2002 2003 2004 2005050150250SWE (mm)SWE (mm)(e)Figure 3.3: SWE time series for manual snow survey stations, the best three griddedSWE products, MLR3 and ANN3. Gridded products are ordered by increasingoverall performance, with darker blues for better performing products. MLR3and ANN3 estimates are masked outside the mean first and final survey dates(January 2 and June 15 respectively). A representative station is shown for eachof the five physiographic regions of BC: (a) station 1D08 Stave Lake in theCoast Mountains and Islands, (b) station 2A25 Kirbyville Lake in the ColumbiaMountains and Southern Rockies, (c) station 4B08 Mount Cronin in the North-ern and Central Plateaus and Mountains, (d) station 2F11 Isintok Lake in theInterior Plateau, and (e) station 4C05 Fort Nelson Airport in the BC Plains.5702004006008000200400600800E GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(a) Coast Mts & IslandsMean Station MAE (mm SWE)01002003004000100200300400E GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(d) Interior Plateau01002003004000100200300400E GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(c) North Plateaus & MtsMean Station MAE (mm SWE)01002003004000100200300400E GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(e) BC PlainsMean Station MAE (mm SWE)02004006008000200400600800E GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(b) Columbias & S Rockies01002003004000100200300400E GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(f) BC−wideFigure 3.4: Mean station MAE for several SWE products/combinations for regionsof BC in order of descending mean accumulations: (a) Coast Mountains andIslands, (b) Columbia Mountains and Southern Rockies, (c) Northern Plateausand Mountains, (d) Interior Plateau, (e) BC Plains, and (f) province-wide (com-bining all data from the five preceding panels). Shown are MAEs with 95% con-fidence intervals based on n = 5000 bootstrap samples for the six gridded prod-ucts in blue: ERA-Interim (E), GlobSnow (GS), MERRALand (ML), MERRA(M), GLDAS2 (G2) and ERALand (EL). Dark blue indicates three best perform-ing products. Also shown are three fusion techniques (mean, MLR and ANN)using all six products (green) and the three best performing products (brown).Note the vertical scale in panels (a) and (b) is double that of panels (c) through(f).580. GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(a) Coast Mts & IslandsMean Station April Correlation GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(d) Interior Plateau0. GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(c) North Plateaus & MtsMean Station April Correlation GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(e) BC PlainsMean Station April Correlation GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(b) Columbias & S Rockies0. GS ML M G2 ELMean6MLR6ANN6Mean3MLR3ANN3(f) BC−wideFigure 3.5: Mean station April Correlation for several SWE products/combinationsfor regions of BC in order of descending mean accumulations: (a) Coast Moun-tains and Islands, (b) Columbia Mountains and Southern Rockies, (c) NorthernPlateaus and Mountains, (d) Interior Plateau, (e) BC Plains, and (f) province-wide. Product and combination abbreviations and colors are as in Figure 3.4.59the differences in this region are relatively small. The green bars represent differ-ent ways of combining all six products: mean, MLR and ANN, while the brownbars have similar crosshatching but use only the three best performing products. Interms of mean station MAEs, the regional story is again similar to that of the entireprovince for the four regions of highest accumulations. MAEs for the three-productmeans are slightly lower than those of the six-product means. MLRs show notableperformance improvements over multi-product means and are virtually identicalfor both three- and six-product combinations. ANNs further show improved per-formance over the MLRs. Three-product ANNs perform very slightly better thanthe six-product networks and are the best performers of all ANNs. For the BCPlains, both three- and six-product means produce results comparable to those ofthe individual products, but the MLRs and ANNs have a higher mean station MAE.This result is likely due to the fact that relatively few stations exist in this region,causing the models to be less well trained in the BC Plains. In terms of April cor-relations, however, the combination products are relatively high across all regions,with the three-product means and ANNs generally being the best performers.The relative performance of each gridded or combination product relative to areference was found using the confidence intervals for differences approach of Jol-liffe [2007]. Briefly, 5000 bootstrap samples of the mean station MAE and Aprilcorrelation differences between each product and the reference were plotted onbox plots (Figures 3.6 and 3.7), in which the boxes extend between lower and up-per quartiles and the whiskers span 95% of the distribution. If the whiskers do notcross the origin, the difference in the product and reference is said to be significant,rejecting the null hypothesis at the 5% level. The mean of all six products (Mean6)was used as the reference and was compared to the remaining gridded and SWEfusion products. Negative mean station MAE differences (Figure 3.6) and positivemean station April correlation differences (Figure 3.7) indicate that the evaluatedproduct performed better than Mean6. The best three products significantly outper-formed Mean6 in MAE across the province and in some of its regions, but Mean6was often a better performer than the individual products in April correlation. Themean of these best three (Mean3) was consistently better than that of Mean6 underboth metrics, and this difference is statistically significant across the province andall regions except the BC Plains. The ANN using all six products (ANN6) and the60−300−1000100E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(a) Coast Mts & IslandsMAE Difference (mm SWE)−100−50050E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(d) Interior Plateau−100050100E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(c) North Plateaus & MtsMAE Difference (mm SWE)0204060E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(e) BC PlainsMAE Difference (mm SWE)−200−1000100E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(b) Columbias & S Rockies−200−100050100E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(f) BC−wideFigure 3.6: Bootstrap mean station MAE differences between several SWE prod-ucts/combinations and the reference (the mean of six products). First quartile,median and third quartile, plotted respectively as the bottom, waist and top ofeach box, as well as 95% confidence intervals indicated by whiskers, are shownfor regions of BC in order of descending mean accumulations: (a) Coast Moun-tains and Islands, (b) Columbia Mountains and Southern Rockies, (c) NorthernPlateaus and Mountains, (d) Interior Plateau, (e) BC Plains, and (f) province-wide. Product and combination abbreviations and colors are as in Figure 3.4.Note differences in vertical scale.61−0.6−0.4−0.20.0E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(a) Coast Mts & IslandsApril Correlation Difference −0.5−0.3−0.1E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(d) Interior Plateau−0.6−0.4−0.20.0E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(c) North Plateaus & MtsApril Correlation Difference −0.4−0.20.0E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(e) BC PlainsApril Correlation Difference −0.6−0.4−0.20.0E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(b) Columbias & S Rockies−0.5−0.3−0.1E GS ML M G2 ELMLR6ANN6Mean3MLR3ANN3(f) BC−wideFigure 3.7: Bootstrap mean station April Correlation differences between severalSWE products/combinations and the reference (the mean of six products). Quar-tiles and 95% confidence intervals are shown for regions of BC in order of de-scending mean accumulations: (a) Coast Mountains and Islands, (b) ColumbiaMountains and Southern Rockies, (c) Northern Plateaus and Mountains, (d) In-terior Plateau, (e) BC Plains, and (f) province-wide. Product and combinationabbreviations and colors are as in Figure 3.4. Note differences in vertical scale.62one using the best three products (ANN3) also significantly outperformed Mean6and slightly outperformed corresponding multiple linear regressions (MLR6 andMLR3) province-wide and in all physiographic regions except for the BC Plains.In the BC Plains MLRs were slightly better than Mean6 in April correlation but sig-nificantly underperformed in MAE. ANN performance in this region was similar,though the underperformance in MAE was less notable. This failure of the MLRsand ANNs to perform well in MAE was likely due to the lower accumulations ofSWE here, accompanied by the scarcity of manual snow survey stations.A closer examination of ANN performance revealed improvements over com-parable MLR. Figure 3.8 shows the results of n = 5000 bootstrap samples of meanstation MAE and April correlation differences between the ANNs and MLRs usingsix products (panels a and c) and the best three products (panels b and d). Negativemean station MAE differences (panels a and b) and positive mean station Aprilcorrelation differences (panels c and d) indicate the regions in which the ANNperformed better than the comparable MLR. ANN improvements over MLR wereseen across all regions, and those improvements were statistically significant inmost regions and across the province. While non-significant differences betweenANNs and MLRs were found for the best three product MAEs and correlations inthe Northern and Central Plateaus and Mountains (panels b and d) and for bothcombinations’ correlations in the BC Plains (panels c and d), median ANN im-provements over MLR were observed. This result suggests nonlinear modeling ofthe SWE survey data set may merit further investigation as a tool to better estimateprovince-wide and regional SWE.The performance of the six product and best three product ANNs were alsocompared to VIC runs for the four major watersheds in BC shown in Figure 3.1[Pacific Climate Impacts Consortium, 2014]. As these four watersheds do not coverall of the available manual snow stations, station test splits were redrawn and theANNs runs were conducted separately from those mentioned above. Figure 3.9shows the results of n = 5000 bootstrap samples of mean station MAE and Aprilcorrelation differences between the ANNs and VIC using six products (panels aand c) and the best three products (panels b and d). Negative mean station MAEdifferences (panels a and b) and positive mean station April correlation differences(panels c and d) indicate the regions in which the ANNs performed better than VIC.63−80−60−40−200MAE Difference (mm SWE)CM CR NP IP BP BC(a)−80−60−40−200MAE Difference (mm SWE)CM CR NP IP BP BC(b) Correlation Difference CM CR NP IP BP BC(c) Correlation Difference CM CR NP IP BP BC(d)Figure 3.8: Box plots of bootstrap station differences in MAEs and April correla-tions showing quartiles and 95% confidence intervals for ANNs of six products,ANN6 (green), and of the best three products, ANN3 (red), with respect toMLR6 and MLR3 respectively. Bootstrap difference distributions are shownfor (a) ANN6−MLR6 MAEs, (b) ANN3−MLR3 MAEs, (c) ANN6−MLR6April correlations and (d) ANN3−MLR3 April correlations. Shown arethe differences for each physiographic region: Coast Mountains and Islands(CM), Columbia Mountains and Southern Rockies (CR), Northern and CentralPlateaus and Mountains (NP), the Interior Plateau (IP), and the BC Plains (BP),as well as province-wide (BC).64−100−50050MAE Difference (mm SWE)CM CR NP IP BP BC(a)−100−50050MAE Difference (mm SWE)CM CR NP IP BP BC(b)− Correlation Difference CM CR NP IP BP BC(c)− Correlation Difference CM CR NP IP BP BC(d)Figure 3.9: Box plots of bootstrap station differences in MAEs and April correla-tions showing quartiles and 95% confidence intervals for ANNs of six prod-ucts, ANN6 (purple), and of the best three products, ANN3 (orange), with re-spect to VIC. Bootstrap difference distributions are shown for (a) ANN6−VICMAEs, (b) ANN3−VIC MAEs, (c) ANN6−VIC April correlations and (d)ANN3−VIC April correlations. Notation for the physiographic regions followsFigure 3.8.65Across the province, the ANNs outperformed VIC in MAEs and April correlationsacross most regions. The exceptions included MAEs in the Northern and CentralPlateaus and Mountains and BC Plains, where VIC performed slightly better thanthe ANNs, and April correlations in the Coast Mountains and Islands, where thetwo are nearly tied. Statistically significant differences were seen only for MAEsin the Columbia and Rocky Mountains and for April correlations in the Northernand Central Plateaus and Mountains. However, across the entire province ANN3significantly outperformed VIC in both MAEs and April correlations. This resultsuggests that consideration of a large enough data set and the use of the best griddedproducts for the province as inputs to an ANN can produce better estimates of SWEin BC, both spatially and temporally.3.5 ConclusionSix gridded products have been used to predict SWE at manual snow survey sta-tions using data fusion techniques. The products estimated lower snow accumu-lations than were measured, but ANNs combining a single gridded SWE productwith a base set of relevant predictors reduced the products’ mean station MAEs by43% to 55% (average 48%) across the province, improvements that are mirroredwithin most physiographic regions. ANNs which used multiple gridded productsas predictors performed better than those using single products. The best perform-ing ANN used the best three products (ERALand, MERRA and GLDAS2), withno improvement seen by including additional products in the ANN. This ANN wasfound to have a mean station MAE 47% to 60% (average 53%) lower than the sixindividual products considered in this study. Both MLRs and ANNs significantlyoutperformed the mean of six products in terms of MAEs and April correlationsacross the province and in four regions. Only in the BC Plains, a region of low ac-cumulations and few stations, did the products themselves have lower MAEs, whilestill underperforming the SWE fusion products in April correlations. The ANNsalso outperformed comparable MLRs using the same gridded products; this assess-ment was statistically significant across the province and in most physiographicregions. Comparing ANNs with runs of the hydrologic model VIC, it was foundthat the ANNs performed better across the province and in three regions, though66statistical significance was found in only one region and across the province for theANN using the best three products.While estimating SWE using a fusion of available gridded products and a baseset of relevant spatiotemporal covariates poses limits in context and process under-standing, from an operational standpoint it can potentially improve the representa-tion of SWE via a far more efficient approach. Further development of the ANNs,including the incorporation of additional covariates and data sources (e.g. terraininformation, a simple snow model, etc.), may result in further improvements to theestimation of snow in BC.67Chapter 4Enhancements with SnowModeling and Land Covariates4.1 IntroductionWhat are the most promising ways to improve large-scale estimates of snow ac-cumulations in complex regions? This is a fundamental question on the mindsof many water resources managers and planners, particularly for rugged areaswhich receive exceptionally high snowfall. Accurate estimates of winter snow wa-ter equivalent (SWE) are vital for hydrologic forecasting and planning especiallywhen the annual runoff signal is dominated by seasonal snowmelt. Numerous read-ily available data products provide such estimates on basin to hemispheric scales,but all are subject to their own limitations. SWE retrievals from passive microwavemeasurements via spaceborne instruments afford frequent global coverage but aresusceptible to signal masking and wash-out under conditions of heavy forest coverand very deep snowpacks [Chang et al., 1996, Derksen, 2008]. Reanalyses and dataassimilation-based products incorporate larger numbers of observations but can besubject to biases related to product resolution, sampling strategy changes, meanclimate state ambiguities and other factors [Reichler and Kim, 2008, Dutra et al.,2011]. Systematic evaluations of various gridded products have observed largespreads in estimated SWE, from hemisphere-wide studies [Mudryk et al., 2015] tothe regional scale addressed in Chapter 2.68A line of inquiry increasingly being pursued for the improvement of hydro-logic and snowmelt modeling and forecasting is the use of machine learning. Earlyefforts focused on precipitation and streamflow forecasting. Hsu et al. [1997] de-veloped an adaptive artificial neural network (ANN) capable of estimating hourlyrainfall intensity at a relatively fine resolution of 0.25°× 0.25° using infrared satel-lite imagery. Visible imagery was subsequently incorporated into the product, withongoing explorations of the use of a variety of input features including topogra-phy, wind direction, relative humidity and both observation data and estimatesprovided by numerical weather prediction models [Hsu et al., 1999]. In investi-gations of reservoir inflow forecasting Coulibaly et al. [2001] found that input timedelays significantly improved the conventional multilayer perceptron (MLP) net-work, though alternative algorithm formulations such as a recurrent neural network(RNN) were more effective for the studied prediction problem.The application of machine learning to cryosphere problems has begun to growwithin the hydrology community. Following Chapter 3, machine learning tech-niques have been applied to modeling snowmelt-driven reservoir inflows and stream-flow simulations [Dibike and Coulibaly, 2008] as well as snow thickness and SWEestimation. Binaghi et al. [2013] estimated snow cover thickness in the Italian Cen-tral Alps using radial basis function networks, finding better performance than stan-dard interpolation techniques typically used in sparse data regions. ANNs for SWEretrieval [Chen et al., 2001, Guo et al., 2003] mapped snow hydrology and densemedia radiative transfer model outputs to target observations, improving on snowgrain size representation and snow accumulation and melt distributions. Evoraet al. [2008] used microwave brightness temperature (TB) data and geostatistics inANNs with a target SWE field interpolated from snow depth and density measure-ments. SWE estimates derived from ANNs using spaceborne passive microwaveTB signals and ground measurements developed by Tedesco et al. [2004] and Tonget al. [2010] were found to compare favorably against state-of-the-art SWE re-trieval algorithms, including the spectral polarization difference (SPD) algorithm[Aschbacher, 1989], the Helsinki University of Technology (HUT) snow emissionmodel [Pulliainen et al., 1999] and other TB difference algorithms [Chang et al.,1987, Derksen, 2008]. Other approaches using machine learning methods to moreaccurately estimate snow have been further discussed in review [Gan, 1996, Evora69and Coulibaly, 2009, Shi et al., 2016], but none of the reviewed studies utilized ex-isting gridded SWE data products directly. The only study to use gridded productsdirectly in a machine learning algorithm was developed in Chapter 3, which foundthat among six gridded SWE products ranked across BC in Chapter 2, the com-bination of the top three constituted the best performing ANN across BC, thoughregional differences were not always statistically significant. Incorporation of addi-tional terrain covariates and a simple snow model were identified as possible meansto further improvements in snow estimates.Terrain elevation and topography are significant predictors of changes in snowcover. Varhola et al. [2010] reported smaller differences in snow accumulationbut larger differences in melting between the forests and clearcuts at higher alti-tudes. Topography can also complicate the remote sensing of the snowpack bywind redistribution (i.e. falling or blowing from higher peaks into deeper valleys)and can radically affect snowmelt rates and distribution during melt off. A bi-variate analysis of snow-related response variables in a catchment in the Pyreneesyielded the following primary indices: elevation, slope gradient, potential directsolar radiation input (related to latitude and aspect) and wind exposure [Ander-ton et al., 2004]. These indices, extracted from a high-resolution digital elevationmodel (DEM), suggested that there was a strong topographic control on SWE dueto wind redistribution and insolation-enhanced melting.Snow models of varying complexities are used to estimate and predict snowaccumulation and melt at local to global scales. Process-based energy balancemodels seek to comprehensively emulate the fundamental physical influences ofspatial and temporal variability in snow accumulation and melt. Improvements inthe estimation of snow accumulation and melt in such models are often the result ofbetter representations of the underlying physical processes. For instance Sultanaet al. [2014] studied an alternative snow surface temperature representation andmelt model formulation of the Utah Energy Balance (UEB) model in the Noahland surface model (LSM) version 2.7.1. By using the force-restore method forfinding snow-skin temperature and allowing for warming and ripening of the entiresnowpack before snowmelt onset, the UEB model improved the representation ofSWE in the Noah LSM. While the accuracy of such models is well established inthe literature [Barry et al., 1990, Blo¨schl et al., 1991, Walter, 1995], their extensive70meteorological input data and parameter tuning requirements frequently representstrong disadvantages for operational applications [Jost et al., 2012].On the other end of the snow model complexity spectrum are so-called temperature-index or degree-day models. Temperature-index models estimate snowmelt in pro-portion to the positive ambient air temperature as well as the elapsed time abovefreezing. Development of these models may include the incorporation of otherfactors that may increase or decrease snowmelt rates by adjusting the melt fac-tor. Examples include the consideration of potential direct solar radiation [Hock,1999], measured radiation [Rango and Martinec, 1995] and forest cover [Rowlandand Moore, 1992]. While energy balance snow models usually require complexdata sets and tuning, the inputs to temperature-index models are simpler, morereadily available and in the case of temperature, more or less reliably interpo-lated [Ferguson, 1999, Stahl et al., 2006a]. However the minimal data utilizedand lack of underlying process representations present limitations to the accuracyof temperature-index snow models.The compromise between accuracy and simplicity made by Walter et al. [2005]was to supply an energy balance snow model with very basic weather inputs andthen estimate other key data. The developed model required only minimum andmaximum temperatures, precipitation and location latitude. Energy balance simpli-fications included: solar radiation using Bristow and Campbell [1984] and Ndlovu[1994] approximations for atmospheric transmissivity; long wave radiation usingCampbell and Norman [1998] estimates for atmospheric long wave emissivity; sen-sible heat exchange using the Campbell [1977] expression for resistance to heattransfer; convective heat exchange using the Jensen et al. [1990] approximationfor saturation vapor density; and a constant ground heat conduction value of 173kJ m−2 day−1 [US Army Corps of Engineers, 1960]. The net outcome of thesesimplifications was that the resulting model required no more data and still per-formed better than a standard temperature-index model when compared to in situmeasurements.The issue of the scale mismatch in the comparison and use of relatively coarseresolution gridded SWE products alongside point-scale in situ snow observationsis an ongoing problem [Blo¨schl, 1999]. Snow courses on the order of ten metersin length cannot be reliably estimated by large-scale gridded products with reso-71lutions in the tens of kilometers [Mudryk et al., 2015]. Accumulation and meltare heavily influenced by local topographic and weather conditions, in particularelevation and the presence of wind [Anderton et al., 2004]. Gridded SWE datahave previously been compared with point SWE measurements primarily becauselarge-scale observations are not readily available [Pan et al., 2003, Sultana et al.,2014]. A recent review of scaling in the comparison of SWE estimation meth-ods at the mountain range scale by Wrzesien et al. [2017] concludes that whileit is impossible to completely overcome this mismatch, useful evaluations can beconducted provided appropriate caveats are made. Correlations of appropriatelysampled yearly time series are valuable for evaluating the capability of productsto capture interannual deviations. Pan et al. [2003] further argue that subgrid spa-tial variability in cold season processes over the larger scale can be accounted forby using a large number of geographically well spread stations. Therefore, com-parisons between gridded data products and point measurements should be madewith caution, and such measurements can arguably be used to improve on thoseproducts, despite the inherent scaling issues.The objective of this research is to enhance an ANN using three top perform-ing gridded SWE products by incorporating additional data sources that improverepresentations of local conditions and the time evolution of the snowpack. Thisenhanced ANN will be evaluated by determining mean absolute error (MAE) andinterannual April (peak) SWE correlation using manual snow surveys (MSS) asground truth. In developing an approach toward this objective, two hypotheseswere formulated:1. A simplified snow accumulation and melt model driven by basic weatherdata at a relatively high spatial resolution can complement sophisticated butlower-resolution gridded products in estimating SWE.2. High-resolution topographic data can complement relatively coarser griddedproducts in improving SWE estimation.These hypotheses will serve as a framework to test the ability of an ANN to fuse in-formation from complementary sources and explore the potential for performanceimprovements in the representation of SWE. Mean station MAEs and correlations72will be calculated for the entire province as well as for the domain of the VariableInfiltration Capacity (VIC) model, which covers four BC watersheds (Figure 4.1).VIC is a calibrated hydrologic model [Pacific Climate Impacts Consortium, 2014]that has been shown to simulate streamflow over evaluated BC subbasins with anaverage volume bias of less than 10% [Schnorbus et al., 2011]. VIC however re-quires a serious effort to set up, calibrate and maintain, plus substantial computertime to run. A comparison of SWE estimates from the ANN with those of VIC isworthwhile because the effort to set up and run the ANN is considerably lower.4.1.1 Study AreaLocated on the Pacific coast of Canada, British Columbia (BC) is bordered by thecontinental United States to the south and the US state of Alaska to the northwest(inset, Figure 4.1). At approximately 95.2 million hectares, this large and diverseprovince is broadly characterized by a flat to gently rolling central interior lyingbetween a rugged western coastline and several mountain chains along the easternborder with neighboring province Alberta. BC is comprised of five major physio-graphic regions. These regions are highlighted in red in Figure 4.1 and listed inTable 4.1 in order of descending mean SWE accumulations, along with the num-ber of manual snow survey stations, land area, mean Terrain Ruggedness Index TRI[Riley et al., 1999] and mean station SWE accumulation by survey month. AlongBC’s western margin including the Pacific seaboard, the Coast Mountains and Is-lands is the most rugged alpine region as measured by mean TRI and receivesthe highest snow accumulations, averaging nearly 650 mm SWE across Januarythrough June beginning-of-month mean survey values. The Columbia Mountainsand Southern Rocky Mountains on the Alberta border to the east is also a veryrugged alpine area with the highest mean elevation of 1599 meters above sea level(masl) when comparing all five regions. Snow accumulations averaging 468 mmSWE are the second highest in the province. At 28.6 million hectares, the Northernand Central Plateaus and Mountains region is the largest in BC and has accumu-lations reaching a mean of 330 mm SWE across monthly surveys. The InteriorPlateau has the second lowest mean SWE of 218 mm and 81 snow survey stations,the most in the province. The Interior Plains of BC, referred to in this work as the73“BC Plains”, is the flattest and lowest region with a mean elevation of 642 masl.It has the fewest stations, lowest station density and lowest mean SWE at 51 mm.Additional information on the physiographic regions of BC can be found in BritishColumbia Ministry of Forests [1995] and Holland [1964].Figure 4.1 also highlights in blue the domain of the calibrated VIC model runby the Pacific Climate Impacts Consortium [Pacific Climate Impacts Consortium,2014] for four BC basins: the Peace, upper Columbia, Fraser and Campbell Riverwatersheds. Table 4.1 lists the numbers of manual snow survey stations for both thecomplete BC-wide physiographic regions and for the portions of those regions thatoverlap the VIC domain. The stations within each of these domains will be used toevaluate ANN model configurations and to determine performance relative to theVIC model, respectively. Mean SWE values shown for the complete physiographicregions are highest in April, hence the April survey values will be considered peakseason SWE for the purposes of this work.74Figure 4.1: Physiographic Regions of British Columbia, Canada, overlaid on theGLOBE 1-km DEM displayed with a standard terrain color map. The five phys-iographic regions outlined in red are: (A) Coast Mountains and Islands, (B)Columbia Mountains and Southern Rockies, (C) Northern and Central Plateausand Mountains, (D) Interior Plateau, and (E) BC Plains. Mean SWE accumula-tions increase from region A to region E. Outlined in blue is the domain coveredby the VIC model as run by the Pacific Climate Impacts Consortium, which cov-ers four basins throughout BC: (I) the Peace, (II) upper Columbia, (III) Fraserand (IV) Campbell River watersheds. Manual snow survey station locationsare indicated by purple dots. Inset: A map of Canada adapted from Wikime-dia Commons highlights in red the province of BC, bordered to the south bythe continental United States (USA) and to the west by the US state of Alaska(AK).75Table 4.1: Physiographic Region SummaryRegion # stations by domain Area Mean Mean SWE (mm)BC-wide VIC (×106 ha) TRI (m) Jan Feb Mar Apr May JunCoast Mountains & Islands 57 27 25.3 560 402 523 673 840 821 637Columbia Mountains & Southern Rockies 70 66 11.3 485 309 428 515 590 581 387Northern & Central Plateaus & Mountains 43 24 28.6 268 253 276 343 397 378 331Interior Plateau 81 53 21.1 158 155 196 258 286 237 173BC Plains 5 3 10.6 67 49 68 84 89 15 0764.2 MethodsPrevious work ranked gridded SWE products over BC (Chapter 2) and tested theperformance of various combinations in an ANN model (Chapter 3). Manual snowsurveys were used as target data, and the three best performing products formedthe basis for the ANN which gave the lowest overall mean station MAE. Theseproducts were included in a base ANN that serves as a reference point for thiswork.The Modern-Era Retrospective Analysis for Research and Applications knownas MERRA [Rienecker et al., 2011] is a modern reanalysis product generated byNASA. MERRA relies on conventional observations (e.g. radiosondes, dropson-des, aircraft, ship and buoy) and satellite observations (SSM/I, TRMM, QuikScat,TOVS and MODIS), which are assimilated using the Goddard Earth ObservingSystem Data Assimilation System Version 5 (GEOS-5) [Rienecker et al., 2008].Native horizontal spatial resolution is one-half degree latitude by two-thirds degreelongitude. SWE was downloaded from the MERRA 2D land surface diagnosticsdata product (MAT1NXLND) using the data subsetter to retrieve a spatial domaincovering the study area.The Global Land Data Assimilation System GLDAS [Rodell et al., 2004] usesadvanced land surface models and data assimilation to ingest a host of ground andsatellite-based measurements, generating optimal land surface fields and fluxes.Produced by NASA, GLDAS runs using 15 minute timesteps to produce threehour output files. Four land surface models are driven by GLDAS: Mosaic, Com-mon Land Model (CLM) Version 2.0, Variable Infiltration Capacity (VIC) andNoah Version 3.3. GLDAS Version 2.0, hereinafter referred to as GLDAS2, drivesthe Noah LSM, which runs on a 0.25° × 0.25° grid for all land north of 60°S.Forced entirely with the Princeton meteorological forcing data set [Sheffield et al.,2006], GLDAS2 was initialized on January 1, 1948 using climatological soil statefields represented at four vertical layers. Common GLDAS data sets employedare the MODIS Land Cover Type product MCD12Q1 [Friedl et al., 2010], theMODIS 250 m land-water mask MOD44W [Carroll et al., 2009], the United Na-tions’ Food and Agriculture Organization (FAO) soil map [Reynolds et al., 1999]and the Global 30 Arc-Second Elevation (GTOPO30) product. SWE was extracted77from the GLDAS2 Noah-based three hour data set for the province of BC.The European Centre for Medium-Range Weather Forecasts (ECMWF) Re-analysis ERA-Interim [Dee et al., 2011] is a global atmospheric reanalysis avail-able from 1979 to present. A host of land surface, oceanographic, atmosphericand spaceborne measurements from numerous sources is assimilated using the In-tegrated Forecast System (IFS) to produce six hour field and flux files at T255spectral resolution, approximately 0.7°. ERA-Interim/Land [Balsamo et al., 2015],referenced in this work as ERALand, is an offline simulation of land surface pro-cesses using precipitation-adjusted ERA-Interim forcings and an improved landsurface model. Monthly accumulated precipitation adjustments are based on theGlobal Precipitation Climatology Project (GPCP) v2.1 [Huffman et al., 2009]. TheHydrology-Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL)land surface model contains several improvements over the ERA-Interim LSM: anupdated soil hydrology with a new formulation for hydrologic conductivity anddiffusivity [Balsamo et al., 2009], a new snow scheme including liquid water stor-age, a snowpack density reformulation and radiative improvements [Dutra et al.,2010], a vegetation climatology based on MODIS product MOD15A2 [Boussettaet al., 2013] and a lower stress threshold bare soil evaporation framework [Balsamoet al., 2011]. ERALand also closes the water balance, making it more appropriatefor climate-related studies than the ERA-Interim implementation. ERALand snowdepths in meters of water equivalent were downloaded from the ECMWF PublicDatasets web interface and converted to mm SWE.These gridded products were the principle inputs to a base ANN, which wasenhanced in this work. Other covariate predictors used in the base ANN included:1. latitude2. longitude3. station elevation4. station grid elevation difference5. sin(2pidn365.25), where dn is the day of the year (dn = 1 on January 1)6. cos(2pidn365.25)7. year8. physiographic region78Manual snow survey station latitudes, longitudes and elevations were suppliedby the BC Ministry of Environment, as were observations and dates. Observationdays of the year (dn) were represented as shown in the above covariate predic-tor list, while simple integer values were used for years. Station grid elevationdifferences were determined based on the Global Land One-kilometer Base Ele-vation (GLOBE) Digital Elevation Model [Hastings et al., 1999], a product of theNational Oceanic and Atmospheric Administration (NOAA). All GLOBE DEMvalues within gridded SWE product cells were averaged, and the elevations of sta-tions were subtracted from the mean grid cell elevations of their encompassingcells. Physiographic region boundaries were taken from Bulletin 48 of the BritishColumbia Ministry of Energy, Mines and Petroleum Resources [Holland, 1964].While large-scale gridded SWE products are expected to capture broad tem-poral patterns, local conditions also control in situ SWE. Slope and aspect affectsnow accumulation and distribution especially in the presence of wind, and thesein turn largely drive ablation patterns once the melt season commences [Ander-ton et al., 2004]. Roughness of the surrounding terrain alters circulation and snowdistribution patterns. Forest cover, while also an important factor in snow accumu-lation and melt, is not well documented at the survey site scale and hence was notconsidered in this work.Accordingly, slope, aspect and a measure of terrain ruggedness were includedas covariates in the ANN to be tested for performance improvements. All terraincalculations were based on the GLOBE 1-km DEM, with the elevation of DEMcells over water-covered areas masked to zero. Slope and aspect were calculatedusing the R [R Core Team, 2015] package SDMTools [VanDerWal et al., 2014]which utilizes the methods of Burrough and McDonnell [1998]. These covariateswere subsequently converted to Cartesian coordinates according to the decompo-sitions in equations 4.1 and 4.2:SAsin = slope× sin(aspect). (4.1)SAcos = slope× cos(aspect). (4.2)79The Terrain Ruggedness Index, TRI, was calculated according to equation 4.3 [Ri-ley et al., 1999]:TRI =[∑(xi j− x00)2] 12 , (4.3)where xi j = the elevation of each of the eight neighboring DEM cells to the centercell x00 for which TRI is being calculated.The R package EcoHydRology [Fuka et al., 2014] contains a snow accumula-tion and melt routine called SnowMelt that calculates SWE using an energy bal-ance approach but requires limited inputs comparable to those typically used in atemperature index model [Walter et al., 2005]. SnowMelt solves the full snowpackenergy balance using components estimated from daily minimum and maximumtemperature (Tmin and Tmax). Besides temperatures, the only required inputs are pre-cipitation, latitude and date. Energy components are calculated using parametersdefaulted to typical values (e.g. ground albedo = 0.25) and approximations such asthe Bristow-Campbell equation for incoming solar radiation [Bristow and Camp-bell, 1984]. Temperature and precipitation values were taken from daily AustralianNational University Spline (ANUSPLIN) fields [Hopkinson et al., 2011, McKen-ney et al., 2011] over BC which were bias-corrected to match 1971-2000 climatenormals from ClimateWNA [Wang et al., 2012] based on the Parameter-ElevationRegressions on Independent Slopes Model (PRISM) [Daly et al., 2008] and sub-sequently regridded to a 0.0625° resolution. SnowMelt was run using temperatureand precipitation from each of the regridded cells to create a SWE time series, fromwhich calculated SWE values were extracted for reported MSS observation dates.SnowMelt SWE series were compared to daily in situ data from AutomatedSnow Weather Stations (ASWS). The ASWS network is comprised of weather sta-tions throughout BC which are operated by the Ministry of Environment, BC Hy-dro, Rio Tinto Alcan and the Greater Vancouver Water District. The automated sta-tions include snow pillows, which measure SWE by reading the overlying pressureof the snowpack, as well as sensors for snow depth, air temperature and precipita-tion. Daily snow pillow data are available from the BC River Forecast Centre. Fig-ure 4.2 shows examples of 1999 snow season SWE traces for the SnowMelt model(black) and snow pillows (red) with 15 and 30 day time lags (orange). Four sta-tions are shown, one in each of BC’s physiographic regions except the BC Plains,80which has no snow pillows. For comparison, MSS SWE values and means of thethree constituent gridded SWE products are shown in purple circles and blue aster-isks respectively. Of the 32 pillows with available data for the 1999 snow season,the SnowMelt model melted off the snow prematurely 84% of the time. The timedifference between the SnowMelt model and pillow melt-off was generally in therange of a week to a month. In order to compensate for this discrepancy, a laggedsnow model predictor was introduced into the ANN. Lag combinations tested in-cluded the following: 0 days; 0 and 30 days; 0, 15 and 30 days; and 0, 7, 14, 21and 28 days. Calculated SWE values were extracted for reported MSS observationdates plus the number(s) of days in each lag combination and included as SWEpredictors in the ANN. Building the covariate predictor matrix in this way blendedthe SnowMelt model results along with compensation for its observed performancelimitations into an ANN that relied on gridded SWE products and other very dif-ferent sources of information.Using the base set of covariates, additional terrain information and a simplesnow model with lagged SWE values, an enhanced ANN was created using the Rpackage monmlp Cannon [2015]. Ten test splits were created by leaving out ev-ery tenth station from a list ordered by station ID, and mean station statistics werefound for each station in the split for which that station was left out. Predictor andpredictand data sets were constructed using the block bootstrap approach [Davisonand Hinkley, 1997]. Data blocks demarcated by station ID that ranged in size from10 to 30 years were drawn randomly and with replacement and then appended toarrays until they reached the lengths of the originals. The performance of eachnetwork configuration was measured using the out-of-bootstrap RMSE. This ap-proach enabled the use of a single cross-validation, which reduced computationaltime. A single hidden layer architecture was used for the ANN model, and theoptimal number of hidden neurons (HN) was determined by sequentially increas-ing HN from one to ten, followed by a logarithmically increasing series up to amaximum of 222. When three subsequent hidden neuron runs failed to achieve alower RMSE, the configuration with the lowest RMSE was selected for the giventest split. The rationale for this approach was to cover most possible configurationsfor simple networks but also reduce computational times using larger jumps in HNwhen more complicated networks gave lower RMSEs. A minimum of eight hidden81Nov Jan Mar May Jul050010001500lll ll(a)SWE (mm)Nov Jan Mar May Jul050010001500lll l(b)Nov Jan Mar May Jul050010001500llll(c)SWE (mm)Nov Jan Mar May Jul050010001500ll l(d)lSnow PillowMSSMean productSnowMelt modelSnowMelt lagsFigure 4.2: SWE traces for the 1999 snow season, including those of snow pillowsand the SnowMelt model with lags of 0, 15 and 30 days. Mean SWE valuesof gridded products MERRA, GLDAS2 and ERALand correspond to the datesof manual snow surveys (MSS). The following locations are shown: (a) station1B02 Tahtsa Lake in the Coast Mountains and Islands, (b) station 2C14 FloeLake in the Columbia Mountains and Southern Rockies, (c) station 4A02 PinePass in the Northern Plateaus and Mountains, and (d) station 4B15 Lu Lake inthe Interior Plateau. No automated snow pillows are located in the BC Plains.82neurons was run for each test split in order to prevent an insufficiently complex con-figuration from being selected, which was a problem for some of the networks. Anensemble of 50 members was averaged for each test split, and early stopping wasused to regularize individual ensemble members to prevent overfitting [Prechelt,1998, Hsieh, 2009]. A hyperbolic tangent activation function was used, and initialrandom weights were uniformly distributed in the range ±√0.8/din, where din isthe number of inputs or “fan-in” [Thimm and Fiesler, 1997].4.3 Results and Discussion4.3.1 Enhanced ANN ComparisonA series of enhanced ANN trials was run with different combinations of terraincovariates and snow model lags, which are listed in Table 4.2. Base predictorsare those enumerated in the previous section plus the three gridded SWE products:ERALand, MERRA and GLDAS2. Snow model (SM) lags are shown in daysand indicated according to color: red for SM0 (0 days), green for SM1 (0 and 30days), light blue for SM2 (0, 15 and 30 days), and cyan for SM3 (0, 7, 14, 21and 28 days). Topographic covariate combinations where included are indicatedby suffix: “T” for the Terrain Ruggedness Index (TRI), “sa” for slope and aspectin Cartesian format and “Tsa” for both. Figure 4.3 compares MAEs (panels aand b) and interannual correlations for the peak SWE survey in April, hereinafterreferred to as simply correlations, (panels c and d) for these runs with the base ANN(ANNbase) across BC. Mean station values are shown in bar plots (panels a and c).Whiskers denote 95% confidence intervals determined using the percentile methodon a non-parametric bootstrap of 5000 samples [Jolliffe, 2007]. Box plots (panelsb and d) depict the distributions of differences between the enhanced ANNs andthe ANN without terrain or snow model covariates. The relevant statistics (MAEsor correlations) were calculated on a station basis for each enhanced ANN andsubtracted from those of a benchmark, here ANNbase. The median (solid blackline), interquartile range (box) and 95% confidence interval (whiskers) were thendetermined using the same percentile method [Jolliffe, 2007].Enhanced ANNs in Figure 4.3 are ordered by snow model lag combinations83Table 4.2: Enhanced ANN Run Predictor CombinationsANN Name Base Snow Model TRI slope × slope ×Predictors Lags (d) sin(aspect) cos(aspect)ANNbase XSM0 X 0SM0T X 0 XSM0sa X 0 X XSM0Tsa X 0 X X XSM1 X 0, -30SM1T X 0, -30 XSM1sa X 0, -30 X XSM1Tsa X 0, -30 X X XSM2 X 0, -15, -30SM2T X 0, -15, -30 XSM2sa X 0, -15, -30 X XSM2Tsa X 0, -15, -30 X X XSM3 X 0, -7, -14, -21, -28SM3T X 0, -7, -14, -21, -28 XSM3sa X 0, -7, -14, -21, -28 X XSM3Tsa X 0, -7, -14, -21, -28 X X X(colored red, green, blue and cyan) and then by terrain covariate inclusion. Formean station MAE (panel a), there was not a large apparent advantage to increasingthe number and frequency of snow model lags. Within a given snow model lag con-figuration, however, adding topographic covariates tended to increase MAE. The95% confidence interval of MAE difference with ANNbase (panel b) did not crosszero for most ANNs, especially those that did not use slope and aspect. Where the95% confidence interval does not cross the horizontal zero line, the null hypothesisis rejected at the 5% level, and the difference between the two models is consid-ered statistically significant. For mean station correlation (panel c), increasing thenumber and frequency of snow model lags tended to improve performance, thoughfor a given set of lags there was usually little difference between the different topo-graphic covariate sets. When considered across the entire province, all enhancedANNs showed a statistically significant improvement in correlation over ANNbase(panel d) as none of the 95% confidence intervals crossed the horizontal zero line.84130140150160170180ANNbaseSM0SM0TSM0saSM0TsaSM1SM1TSM1saSM1TsaSM2SM2TSM2saSM2TsaSM3SM3TSM3saSM3Tsa(a)Mean Station MAE (mm SWE)0.680.700.720.740.76ANNbaseSM0SM0TSM0saSM0TsaSM1SM1TSM1saSM1TsaSM2SM2TSM2saSM2TsaSM3SM3TSM3saSM3Tsa(c)Mean Station April Correlation −25−20−15−10−50(b)SM0SM0TSM0saSM0TsaSM1SM1TSM1saSM1TsaSM2SM2TSM2saSM2TsaSM3SM3TSM3saSM3Tsa0. 4.3: Performance comparisons of ANNs using various combinations of snowmodel lags and topographic covariates with respect to the base ANN (gray bars)across the province of BC. Covariates included in each combination are given inTable 4.2. For each of the given combinations, mean station (a) MAEs and (b)MAE differences with respect to the base ANN as well as mean station April(c) correlations and (d) correlation differences with respect to the base ANN areshown. ANN SM2, highlighted in bold, was selected as the enhanced networkfor further evaluation.85Considering the performance of enhanced ANNs across BC, the inclusion oflagged snow model values tended to both decrease mean station MAE and increasemean station correlation. However, the addition of slope, aspect and TRI yieldedsmall increases in MAE with little to no improvements in correlation. This findingsuggested it was better to focus on improvements associated with the addition ofthe lagged snow model. The lowest overall mean station MAE (143 mm SWE)was found for the ANN using the base predictors described at the beginning ofthis section plus snow model lags of 0, 15 and 30 days but without topographiccovariates (SM2 in Figure 4.3). Consequently, SM2 was selected as the enhancedthree-product ANN, or “ANN3e”, to be further evaluated against other productsand models. The multiple linear regression (MLR) model using the same predictorsas ANN3e was identified as “MLR3e” in the analyses which follow.4.3.2 MAE and Correlation Values and DifferencesEvaluating the performance of products and models over just the VIC domain (Fig-ure 4.4) allows relative improvements to be gauged in comparison to VIC. Such acomparison is worthwhile because of the considerable differences in the times toset up and run the ANN and VIC, as noted in Section 4.1. Within the VIC do-main, ANN3e outperformed MERRA, GLDAS2, ERALand, the mean of thosethree products, SnowMelt, MLR3e and VIC in both MAE and correlation. Figure4.4 panels a and c present mean station MAEs and correlations of each product orcombination, while panels b and d show mean station MAE and correlation differ-ences with respect to ANN3e. 95% confidence intervals are indicated by whiskerson each bar and box. These results were calculated for stations in the VIC domainin order to enable comparisons with VIC and were comparable to those calculatedfor all stations across BC.In Figure 4.4 panels a and b lower values in mean station MAE and higherMAE differences (i.e. compared MAE − ANN3e MAE > 0) indicate better per-formance by ANN3e. Averaging the constituent products resulted in a 3% to 7%performance improvement over the individual products. The ANUSPLIN-drivenSnowMelt model had a mean station MAE that was an average of 31% lower thanthose of the gridded products, and the mean performance improvement of MLR3e86050100150200250300MERRAGLDAS2ERALandMean3ANNbaseSnowMeltMLR3eANN3eVIC(a)Mean Station MAE (mm SWE) Station April Correlation 050100150200(b)MERRAGLDAS2ERALandMean3ANNbaseSnowMeltMLR3eVIC−0.25−0.15−0.050.00(d)MERRAGLDAS2ERALandMean3ANNbaseSnowMeltMLR3eVICFigure 4.4: Performance comparisons of the best three gridded SWE products(MERRA, GLDAS2 and ERALand), the means of those products (Mean3),the ANUSPLIN-driven EcoHydRology SnowMelt model (SnowMelt), the en-hanced three-product MLR (MLR3e) and the Variable Infiltration Capacitymodel (VIC) with the enhanced three-product ANN (ANN3e) at stations withinthe VIC domain. Mean station (a) MAEs and (b) MAE differences with respectto ANN3e as well as mean station April (c) correlations and (d) correlation dif-ferences with respect to ANN3e are shown.87was larger still at 45%. ANN3e though had the lowest MAE of all, improving onthe products by an average of 56%. ANN3e also had a mean station MAE 31%lower than the calibrated hydrologic model VIC. Considering mean station MAEdifferences with respect to ANN3e, the ANN3e performance improvements overall other products and models were statistically significant at the 5% level, as noneof the confidence intervals crossed the horizontal zero line.Figure 4.4 panels c and d show better ANN3e performance as indicated by itshigher mean station correlations and negative correlation differences (i.e. correla-tion− ANN3e correlation < 0). Correlations for the gridded products ranged from0.52 to 0.69, while those for SnowMelt, VIC and various product combinationsand models (Mean3, ANNbase and MLR3e) were similar to one another (0.68 to0.74) and in many cases had overlapping confidence intervals. ANN3e averageda 31% improvement in correlation over the gridded products and 5% to 13% overother products and models, with a 12% improvement over VIC. The 95% confi-dent levels for correlation differences among all products and models with respectto ANN3e did not cross the horizontal zero line, indicating ANN3e yielded a sta-tistically significant improvement in the representation of interannual April SWEvariations.4.3.3 Regional MAE and Correlation DifferencesRegional breakouts of these metrics yield insight into the performance of ANN3ein different parts of BC. Following the confidence intervals for differences ap-proach of Jolliffe [2007], MAE and correlation differences were calculated be-tween ANN3e and each product and combination. Stations were grouped by phys-iographic region, and 5000 bootstrap samples of these differences were drawn forjust the stations within each region. The box plots in Figure 4.5 summarize thedistributions of the means of these differences in MAE and correlation for each ofBC’s five physiographic regions. Where 95% of the distribution as indicated bythe whiskers does not cross the horizontal zero line, the difference is statisticallysignificant.ANN3e performance improvements in MAE are indicated by positive differ-ences of the compared products and models in Figure 4.5a. Statistically significant8801002003004000100200300400010020030040001002003004000100200300400(a)MAE Difference (mm SWE) Coast Mts & IslandsColumbias & S RockiesNorth Plateaus & MtsInterior PlateauBC PlainsMERRAGLDAS2ERALandMean3ANNbaseSnowMeltMLR3eVIC−0.4−0.3−0.2−0.10.0−0.4−0.3−0.2−0.10.0−0.4−0.3−0.2−0.10.0−0.4−0.3−0.2−0.10.0−0.4−0.3−0.2−0.10.0(b)April Correlation DifferenceMERRAGLDAS2ERALandMean3ANNbaseSnowMeltMLR3eVICFigure 4.5: Regional comparison of the various products and models relative toANN3e over the VIC domain. Mean station (a) MAE differences and (b) Aprilcorrelation differences with respect to ANN3e are shown for the gridded SWEproducts and their means, ANNbase, the ANUSPLIN-driven SnowMelt modelresults, MLR, and VIC.89MAE improvements by ANN3e were seen in the Coast Mountains and Islands,Columbia Mountains and Southern Rockies, and Interior Plateau, with the greatestdifferences seen in the Coast Mountains and Islands for all products and models.MAE improvements were also found in the Northern and Central Plateaus andMountains, but those differences were not statistically significant for SnowMelt,MLR3e and VIC. MAE improvements were not seen in the BC Plains, possiblya result of the fact that it is the region of lowest SWE accumulations and feweststations.In Figure 4.5b correlation improvements by ANN3e are signified by nega-tive differences relative to compared products and models. Statistically signif-icant correlation difference improvements were determined in the Coast Moun-tains and Islands, the Columbia Mountains and Southern Rockies (except relativeto ANNbase), and the Interior Plateau. The greatest correlation differences weremostly seen in the Coast Mountains and Islands. ANN3e also exhibited the bestcorrelations in the Northern and Central Plateaus and Mountains (except relativeto Mean3) and in the BC Plains, though only for GLDAS2 and SnowMelt were theimprovements statistically significant at the 5% level for both regions.4.3.4 SWE Maps and Station PredictionsSWE maps of the major SWE inputs to the enhanced neural network are presentedin Figure 4.6 for April 1, 1999, a high snow year. Major spatial patterns of the mapsshowed some similarities. Relatively high SWE values were apparent in the CoastMountains and the Columbias and Southern Rockies, with the largest accumula-tions along the western border of BC. The central parts of the province receivedsignificantly less snow, and accumulations in the northeastern corner were not vis-ible on the scale of those in the rest of the province. Differences in the magnitudesof SWE were apparent, however. Very high SWE values of 3000 mm and greaterwere seen for MERRA in the southwest (panel a), for ERALand in the far north-western corner (panel c) and for SnowMelt all along the western border (paneld). The resolutions of the gridded input SWE fields also varied considerably from0.75° for ERALand down to under 7 km for the ANUSPLIN-driven SnowMeltmodel.90Figure 4.6: SWE maps for British Columbia for April 1, 1999 for (a) MERRA, (b)GLDAS2, (c) ERALand and (d) ANUSPLIN-driven SnowMelt model results.Physiographic regions are outlined in red. Land and water masks were cre-ated using physiographic region borders [Holland, 1964] and the GLOBE 1-kmDEM [Hastings et al., 1999].Using ANN3e and the gridded input covariates over BC, a SWE map of BCwas constructed for April 1 of the high snow year 1999 (Figure 4.7a). Physio-graphic regions are outlined in red, and purple circles indicate the locations ofmanual snow survey stations, for which one example per region is shown. HighSWE values of 2000 mm and more in light grays to bright whites were prominentthroughout the Coast Mountains along the province’s western border and to a lesserextent in the Columbias and Southern Rockies on the southeastern border. SmallerSWE amounts corresponding to darker grays were found in the central parts ofthe province. In the northeastern corner snow accumulations were not discernible91Figure 4.7: SWE map of British Columbia for April 1, 1999 generated by ANN3e.(a) SWE map shows physiographic regions (red outlines) and locations of fiveMSS stations (purple circles) with letters that correspond to the time series thatfollow. These MSS SWE values and traces from MERRA, GLDAS2, ERALand,MLR3e and ANN3e are shown for snow seasons 1999 to 2005 for the followingstations: (b) station 1D08 Stave Lake, (c) station 2A25 Kirbyville Lake, (d)station 4B08 Mount Cronin, (e) station 2F11 Isintok Lake, (f) station 4C05 FortNelson Airport. MLR3e and ANN3e are masked outside January 2 and June 15as in Figure 3.392relative to peak SWE values in other parts of the province. Figure 4.7 panels bthrough f show the SWE time series at each station for the observations, the grid-ded SWE products MERRA, GLDAS2 and ERALand, MLR3e and ANN3e for the1999 through 2005 snow seasons. For these time series the gridded SWE productsoften underestimated SWE relative to observed MSS values, particularly at sta-tions with higher elevations and hence higher accumulations (panels b though d).MLR3e largely corrected this underestimate, though the ANN3e correction was attimes larger and closer to measurements (e.g. panel c). At stations with lower ac-cumulations, both MLR3e and ANN3e tended to overestimate the measured values(e.g. panel f), but this overcorrection was less severe for ANN3e. An animation ofthe weekly province-wide SWE for the 1999 snow season as found by ANN3e isincluded in Supplement A. This animation shows a peak SWE on approximatelyApril 1, with most seasonal snow melting off by the mean end of season surveydate.4.4 ConclusionAccurate regional estimates of SWE are critical for quality hydrologic forecastingand planning in areas of snowmelt dominated hydrologic regimes. While somereadily available gridded SWE products provide reasonable representations of in-terannual variability, they all underestimate SWE in the difficult context of BritishColumbia, Canada, a mountainous area with extensive forest cover and high snowaccumulations. A base ANN combining the best performing gridded products(MERRA, GLDAS2 and ERALand) and relevant site covariates has been shownin Chapter 3 to improve MAEs relative to those products as well as the calibratedhydrologic model VIC within the province. Within the individual physiographicregions of BC, however, the base ANN was not consistently the best performer.This work sought to enhance the ANN to improve its performance across allregions within BC by incorporating additional terrain covariates and output froma simple snow model. Tested terrain covariates included slope, aspect and Ter-rain Ruggedness Index (TRI), a measure of surface roughness [Riley et al., 1999].When compared with daily snow pillow data, a simplified snowpack energy bal-ance model based on Walter et al. [2005] and driven by bias-corrected ANUSPLIN93temperature and precipitation values was found to reliably estimate snow accu-mulation, but snow melted off prematurely. Lagged predictors were created fromthe SWE time series using several lag sequences. Various combinations of terraincovariates and lagged snow model values were tested to find the best performingconfiguration.This best performing ANN configuration (ANN3e) blended a simple energybalance snow model (SnowMelt) with base ANN predictors including the threeconstituent gridded products (MERRA, GLDAS2 and ERALand). Comparisons ofthe three products, their means, ANNbase, SnowMelt, MLR3e, ANN3e and VICdetermined that ANN3e had the lowest mean station MAE and highest mean stationcorrelation across the province, which were statistically significant improvements.Breakouts of confidence interval differences relative to ANN3e determined perfor-mance improvements in MAE and correlation over other products and models inmost regions. Most importantly, ANN3e outperformed the calibrated VIC modelin most regions for both MAE and correlation.ANN3e was then used to generate a SWE map for all of BC for the peak of ahigh snow year, April 1, 1999. SWE traces at most individual stations showed bet-ter agreement with manual snow surveys, particularly during this high snow year.These results suggest that such an enhanced ANN may be a viable alternative tomore traditional hydrologic models, both in ability to estimate SWE on a regionalscale and in ease of setup and operation.Of the first hypothesis mentioned in the Introduction, our results confirmedthat the ANN model was indeed able to utilize the complementary informationcontained in the relatively high resolution but simple snow model and the sophis-ticated but lower-resolution gridded products to improve SWE estimates. Of thesecond hypothesis, our results were unable to confirm that high-resolution topo-graphic information can lead to further improvement in SWE estimates by ANN.While the inclusion of slope, aspect and terrain ruggedness in enhanced ANNs oc-casionally resulted in small correlation improvements, these improvements werenot consistent, and corresponding MAEs actually increased. Future studies mayexplore the sensitivity of local SWE estimates to DEM resolution and topographiccovariate design.94Chapter 5ConclusionAccurate estimates of snow accumulations are of critical importance to hydrologicforecasting and planning especially for hydrologic regimes dominated by springsnowmelt. These estimates are particularly challenging in BC, a mixed alpine re-gion with complex topography, heavy forest cover and heavy snow accumulations.Better representations of regional-scale SWE can lead to improvements such asstreamflow forecasting, a key desire of many water resources planners and engi-neers.Eight gridded SWE products were compared with manual snow surveys acrossthe province of BC and within its five physiographic regions. All products onaverage had positive interannual correlations but underestimated SWE to varyingdegrees. Median correlations generally decreased and the magnitudes of biasesand MAEs increased as the snow season progressed, indicating a declining abilityto capture interannual variability. Performance statistics generally showed betterestimation ability in physiographic regions with lower SWE accumulations. Thisanalysis determined the following ranking of gridded products in order of decreas-ing performance: ERALand, GLDAS2, MERRA, CMC, GLDAS1, MERRALand,GlobSnow and ERAInterim.The six gridded data products from the original group that covered the studyperiod of 1980 to 2010 were fused in various combinations with a base set of co-variates using an ANN. Other machine learning methods were tested, includingBayesian Neural Networks [MacKay, 1992, Foresee and Hagan, 1997] and Sup-95port Vector Machines combined with evolutionary strategy [Lima et al., 2013], butthese did not perform better than the ANN despite increased computational costs.Gridded products were run in an ANN individually as well as with all better per-forming products, and all possible combinations of the best three were also run.ANNs which used multiple products performed better than those using only oneproduct, and the best performing ANN consisted of the best three products pluscovariates. The three-product ANN (ANN3) had higher mean station correlationsand lower mean station MAEs than the individual products and their means. ANN3also outperformed comparable MLR models within BC and in four out of fivephysiographic regions with statistical significance. Comparisons of ANN3 withthe calibrated VIC hydrologic model revealed statistically significant performanceimprovements across BC. However, ANN3 performance within the physiographicregions was mixed, with statistically significant improvements found in no morethan two of five regions.The final part of this work sought to improve the performance of the ANNby incorporating supplementary data sources. Various data sources were consid-ered for the ANN but were ruled out or did not result in notable improvements.Representations of land cover such as the satellite-derived Normalized DifferenceVegetation Index (NDVI) and Enhanced Vegetation Index (EVI) were primary co-variate candidates. However, although survey sites are initially located in open orcleared areas, virtually no reliable information on the evolution of local canopyclosure is maintained. Since site forest cover conditions are not readily available,the inclusion of land cover could not be justified.Microwave brightness temperatures (TBs) derived from spaceborne platformshave been a key data source in SWE retrievals since early in the satellite era [Changet al., 1987, Kongoli et al., 2004]. Raw TBs were introduced into the ANN, butthese resulted in increases in mean station MAEs (results not shown), likely be-cause of the introduction of considerable noise into the signal. While it is possi-ble some of this noise may be filtered or counterbalanced by using available datasources (e.g. atmospheric contribution as represented by NASA’s MOD6 product),other contributers to noise may not be so easily isolated (e.g. forest cover, whichis not well documented at survey sites, as previously mentioned). As most of theSWE gridded products already include microwave data, it was decided not to pur-96sue this avenue further.El Nin˜o–Southern Oscillation (ENSO) states were also tested as a predictor inthe ANN. Tested ENSO data sets included the Multivariate ENSO Index (MEI)[Wolter and Timlin, 2011], Bivariate ENSO Timeseries (“BEST”) [Smith et al.,2000] and Extended Reconstructed Sea Surface Temperature (ERSST), Version 4[Huang et al., 2015]. While some small improvements were initially seen, the in-troduction of a province-wide predictor evolving through time further complicatedthe required model structure. The likelihood of temporal auto-correlation in theENSO predictor would have added another contraint to the parsing of the stationdata into test splits, necessitating a more complex cross-validation scheme and cor-responding increases in the number of runs and computational time. Furthermore,the incorporated gridded SWE products also reflect the ENSO signal. As a result,ENSO was also dropped from consideration as a predictor.Additional terrain covariates of slope, aspect and Terrain Ruggedness Index(TRI) were calculated using the GLOBE 1-km DEM. In one isolated case, themean station MAE was lowest when TRI was included as a predictor, but the gen-eral trend was that the inclusion of these terrain covariates did not lower mean sta-tion MAE or notably change April correlations (Figure 4.4). Consequently, slope,aspect and TRI were not included in the final model.A simple one-layer snow model driven by bias-corrected ANUSPLIN tem-perature and precipitation fields was also run for the province. SWE outputs ofthis model were compared with snow pillow measurements and found to melt tooquickly at the end of the season, so the calculated SWE values were passed to theANN with various time lag values. Results of ANNs using different combinationsof lagged snow model SWE values were compared to select an enhanced three-product ANN (ANN3e) for further analysis. Lags of 0, 15 and 30 days resulted inthe lowest mean station MAE with little difference in April correlation, and hencethese snow model lags were used in ANN3e. ANN3e outperformed all other prod-ucts and models across the province and within most of the physiographic regionswith statistical significance. Notably, ANN3e achieved statistically significant im-provements over VIC in MAEs for all but two regions and in correlations for allbut one.The BC Plains was the region that consistently saw the poorest performance97by the neural network models and was in fact better represented by the individualgridded products and VIC. This result is likely related to the differences in topo-graphic and climatic conditions from the rest of BC, as well as the limited numberof survey stations to represent these conditions. The BC Plains are characterizedby much flatter topography and far lower snow accumulations than other physio-graphic regions. While the ANNs must generally correct large snow underesti-mates in most of BC, little to no correction is necessary in the BC Plains. Settingup a separate model for each of the physiographic regions may be an alternativeapproach that could lead to better SWE estimates. But the snow survey data limi-tation particularly in the BC Plains may make a separate machine learning modeldifficult to implement even when site conditions and snow accumulations are nothighly variable.Future work may consider the sensitivity of SWE estimates to DEM resolution,topographic covariate design and other model specifications. The generated SWEestimates may be used for a host of applications, such as initializing hydrologicmodels with a data set that draws on additional inputs not currently used by thesemodels. The ANN may be expanded spatially by incorporating SWE observationsfrom a wider area, though the availability of in situ data and high variability in siteconditions may present limitations as described for the BC Plains. Other limita-tions such as the lack of physical process representations may preclude the use ofthe ANN approach for diagnostic applications. Furthermore, predictor dependen-cies are difficult to estimate, especially with correlated predictors. However, theANN model has successfully demonstrated the ability to combine complementaryinformation from a variety of sources in order to produce improved SWE estimatesin BC.The results of this work are representative of a new class of data products thattake advantage of machine learning to fuse data from multiple sources. Such datasets have a variety of potential applications. Initially conceived as a means to bet-ter initialize snow states in VIC by incorporating additional data sources, the ANNmodel results can be assimilated into regional hydrologic models to improve waterresources forecasting and planning efforts. Applying the ANN approach to nearreal-time data, better estimates of regional-scale SWE could be used for applica-tions ranging from advanced avalanche warning systems to improved ski condition98forecasts. Finally, as a quasi-observational gridded data product, the results ofthis work could be used as a benchmark for other models and products, enablingregional intercomparisons of peak SWE magnitudes, interannual variability anddecadal trends.The use of machine learning techniques to solve hydrologic and cryosphereproblems is still in relatively early adoption stages. The growing volumes of avail-able observational data and model results will likely lead to further demands fortechniques that can utilize these resources. This work demonstrates the ability ofmachine learning to perform data fusion of multiple and often complementary datasources, e.g. sophisticated lower-resolution gridded products and higher-resolutionsimplified snow models, for improved SWE estimation. Consequently, this workserves as a prototype for future applications of data fusion approaches in the esti-mation of snow water equivalent on regional scales.99BibliographyR. J. Abrahart, L. M. See, and D. P. Solomatine. Practical Hydroinformatics:Computational Intelligence and Technological Developments in WaterApplications. Water Science and Technology Library. Springer Science &Business Media, 2008. → pages 2S. Anderton, S. White, and B. Alvera. Evaluation of spatial variability in snowwater equivalent for a high mountain catchment. Hydrological Processes, 18(3):435–453, 2004. → pages 10, 11, 51, 70, 72, 79J. Aschbacher. Land surface studies and atmospheric effects by satellitemicrowave radiometry. PhD thesis, University of Innsbruck, Innsbruck,Austria, 1989. → pages 43, 44, 69G. Balsamo, A. Beljaars, K. Scipal, P. Viterbo, B. van den Hurk, M. Hirschi, andA. K. Betts. A revised hydrology for the ECMWF model: Verification fromfield site to terrestrial water storage and impact in the Integrated ForecastSystem. Journal of Hydrometeorology, 10(3):623–643, 2009. → pages 78G. Balsamo, S. Boussetta, E. Dutra, A. Beljaars, P. Viterbo, and B. Van den Hurk.Evolution of land surface processes in the IFS. ECMWF Newsletter, 127:17–22, 2011. → pages 78G. Balsamo, C. Albergel, A. Beljaars, S. Boussetta, E. Brun, H. Cloke, D. Dee,E. Dutra, J. Mun˜oz-Sabater, F. Pappenberger, P. de Rosnay, T. Stockdale, andF. Vitart. ERA-Interim/Land: a global land surface reanalysis data set.Hydrology and Earth System Sciences, 19(1):389–407, 2015.doi:10.5194/hess-19-389-2015. → pages 7, 8, 12, 22, 47, 78R. Barry, M. Pre´vost, J. Stein, and A. P. Plamondon. Application of a snow coverenergy and mass balance model in a balsam fir forest. Water ResourcesResearch, 26(5):1079–1092, 1990. → pages 70100E. Binaghi, V. Pedoia, A. Guidali, and M. Guglielmin. Snow cover thicknessestimation using radial basis function networks. The Cryosphere, 7(3):841–854, 2013. doi:10.5194/tc-7-841-2013. → pages 44, 69C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press,1995. → pages 51C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. →pages 2, 4G. Blo¨schl. Scaling issues in snow hydrology. Hydrological Processes, 13(14-15):2149–2175, 1999. → pages 71G. Blo¨schl, R. Kirnbauer, and D. Gutknecht. Distributed snowmelt simulations inan alpine catchment: 1. model evaluation on the basis of snow cover patterns.Water Resources Research, 27(12):3171–3179, 1991. → pages 70S. Boussetta, G. Balsamo, A. Beljaars, T. Kral, and L. Jarlan. Impact of asatellite-derived Leaf Area Index monthly climatology in a global NumericalWeather Prediction model. International Journal of Remote Sensing, 34(9-10):3520–3542, 2013. → pages 78K. L. Bristow and G. S. Campbell. On the relationship between incoming solarradiation and daily maximum and minimum temperature. Agricultural andForest Meteorology, 31(2):159–166, 1984. → pages 71, 80British Columbia Ministry of Forests. 1994 Forest, Recreation, and RangeResource Analysis. Crown Publications, 1995. → pages 13, 45, 74R. Brown and B. Brasnett. Canadian Meteorological Centre (CMC) daily snowdepth analysis data, 2015. Environment Canada, 2010. Boulder, ColoradoUSA: National Snow and Ice Data Center. → pages 7, 12, 24R. D. Brown and P. W. Mote. The response of Northern Hemisphere snow coverto a changing climate. Journal of Climate, 22(8):2124–2145, 2009. → pages 25E. Brun, V. Vionnet, A. Boone, B. Decharme, Y. Peings, R. Valette, F. Karbou,and S. Morin. Simulation of northern Eurasian local snow depth, mass, anddensity using a detailed snowpack model and meteorological reanalyses.Journal of Hydrometeorology, 14(1):203–219, 2013. → pages 40P. A. Burrough and R. A. McDonnell. Principles of Geographical InformationSystems. Oxford University Press, 1998. → pages 79101G. S. Campbell. An Introduction to Environmental Biophysics. Springer, 1977. →pages 71G. S. Campbell and J. M. Norman. An Introduction to Environmental Biophysics.Springer, 1998. → pages 71A. J. Cannon. monmlp: Monotone Multi-Layer Perceptron Neural Network. Rpackage version 1.1.3, 2015. → pages 4, 48, 81A. J. Cannon and I. G. McKendry. A graphical sensitivity analysis for statisticalclimate models: application to Indian monsoon rainfall prediction by artificialneural networks and multiple linear regression models. International Journal ofClimatology, 22(13):1687–1708, 2002. doi:10.1002/joc.811. → pages 48M. L. Carrera, S. Be´lair, V. Fortin, B. Bilodeau, D. Charpentier, and I. Dore´.Evaluation of snowpack simulations over the Canadian Rockies with anexperimental hydrometeorological modeling system. Journal ofHydrometeorology, 11(5):1123–1140, 2010. → pages 40M. Carroll, J. R. Townshend, C. M. DiMiceli, P. Noojipady, and R. A. Sohlberg.A new global raster water mask at 250 m resolution. International Journal ofDigital Earth, 2(4):291–308, 2009. → pages 77A. Chang, J. Foster, and D. Hall. Nimbus-7 SMMR derived global snow coverparameters. Annals of Glaciology, 9(9):39–44, 1987.doi:10.3189/S0260305500200736. → pages 10, 43, 44, 69, 96A. Chang, J. Foster, and D. Hall. Effects of forest on the snow parameters derivedfrom microwave measurements during the BOREAS winter field campaign.Hydrological Processes, 10(12):1565–1574, 1996.doi:10.1002/(SICI)1099-1085(199612)10:12〈1565::AID-HYP501〉3.0.CO;2-5.→ pages 42, 68C.-T. Chen, B. Nijssen, J. Guo, L. Tsang, A. W. Wood, J.-N. Hwang, and D. P.Lettenmaier. Passive microwave remote sensing of snow constrained byhydrological simulations. IEEE Transactions on Geoscience and RemoteSensing, 39(8):1744–1756, 2001. doi:10.1109/36.942553. → pages 43, 69M. Chen, W. Shi, P. Xie, V. Silva, V. E. Kousky, R. Wayne Higgins, and J. E.Janowiak. Assessing objective techniques for gauge-based analyses of globaldaily precipitation. Journal of Geophysical Research: Atmospheres, 113(D4),2008. → pages 23102E. Cordisco, C. Prigent, and F. Aires. Snow characterization at a global scale withpassive microwave satellite observations. Journal of Geophysical Research:Atmospheres, 111(D19), 2006. → pages 10P. Coulibaly, F. Anctil, and B. Bobee. Multivariate reservoir inflow forecastingusing temporal neural networks. Journal of Hydrologic Engineering, 6(5):367–376, 2001. → pages 69C. Daly, M. Halbleib, J. I. Smith, W. P. Gibson, M. K. Doggett, G. H. Taylor,J. Curtis, and P. P. Pasteris. Physiographically sensitive mapping ofclimatological temperature and precipitation across the conterminous unitedstates. International Journal of Climatology, 28(15):2031–2064, 2008. →pages 80A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Application,volume 1. Cambridge University Press, 1997. → pages 49, 81M. Decker, M. A. Brunke, Z. Wang, K. Sakaguchi, X. Zeng, and M. G.Bosilovich. Evaluation of the reanalysis products from GSFC, NCEP, andECMWF using flux tower observations. Journal of Climate, 25(6):1916–1944,2012. → pages 35D. Dee, S. Uppala, A. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U. Andrae,M. Balmaseda, G. Balsamo, P. Bauer, P. Bechtold, A. Beljaars, L. van de Berg,J. Bidlot, N. Bormann, C. Delsol, R. Dragani, M. Fuentes, A. Geer,L. Haimberger, S. Healy, H. Hersbach, E. Ho´lm, L. Isaksen, P. Ka˚llberg,M. Ko¨hler, M. Matricardi, A. McNally, B. Monge-Sanz, J.-J. Morcrette, B.-K.Park, C. Peubey, P. de Rosnay, C. Tavolato, J.-N. The´paut, and F. Vitart. TheERA-Interim reanalysis: Configuration and performance of the dataassimilation system. Quarterly Journal of the Royal Meteorological Society,137(656):553–597, 2011. doi:10.1002/qj.828. → pages 6, 8, 12, 22, 47, 78C. Derksen. The contribution of AMSR-E 18.7 and 10.7 GHz measurements toimproved boreal forest snow water equivalent retrievals. Remote Sensing ofEnvironment, 112(5):2701–2710, 2008. doi:10.1016/j.rse.2008.01.001. →pages 42, 44, 68, 69Y. B. Dibike and P. Coulibaly. TDNN with logical values for hydrologic modelingin a cold and snowy climate. Journal of Hydroinformatics, 10(4):289–300,2008. → pages 69E. Dutra, G. Balsamo, P. Viterbo, P. M. Miranda, A. Beljaars, C. Scha¨r, andK. Elder. An improved snow scheme for the ECMWF land surface model:103description and offline validation. Journal of Hydrometeorology, 11(4):899–916, 2010. → pages 78E. Dutra, S. Kotlarski, P. Viterbo, G. Balsamo, P. Miranda, C. Scha¨r, P. Bissolli,and T. Jonas. Snow cover sensitivity to horizontal resolution,parameterizations, and atmospheric forcing in a land surface model. Journal ofGeophysical Research: Atmospheres, 116(D21):D21109, 2011. ISSN2156-2202. doi:10.1029/2011JD016061. → pages 42, 68E. Dutra, P. Viterbo, P. M. Miranda, and G. Balsamo. Complexity of snowschemes in a climate model and its impact on surface energy and hydrology.Journal of Hydrometeorology, 13(2):521–538, 2012. → pages 10, 40N. D. Evora and P. Coulibaly. Recent advances in data-driven modeling of remotesensing applications in hydrology. Journal of Hydroinformatics, 11(3-4):194–201, 2009. doi:10.2166/hydro.2009.036. → pages 44, 69N. D. Evora, D. Tapsoba, and D. De Se`ve. Combining artificial neural networkmodels, geostatistics, and passive microwave data for snow water equivalentretrieval and mapping. IEEE Transactions on Geoscience and Remote Sensing,46(7):1925–1939, 2008. doi:10.1109/TGRS.2008.916632. → pages 44, 69R. Ferguson. Snowmelt runoff models. Progress in Physical Geography, 23(2):205–227, 1999. → pages 71R. Fletcher. A new approach to variable metric algorithms. The ComputerJournal, 13(3):317–322, 1970. doi:10.1093/comjnl/13.3.317. → pages 48R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. TheComputer Journal, 7(2):149–154, 1964. → pages 3F. Foresee and M. Hagan. Gauss-Newton approximation to Bayesianregularization. In IEEE 1997 International Joint Conference on NeuralNetworks (IJCNN97), pages 9–12, 1997. → pages 95B. A. Forman and R. H. Reichle. Using a support vector machine and a landsurface model to estimate large-scale passive microwave brightnesstemperatures over snow-covered land in North America. IEEE Journal ofSelected Topics in Applied Earth Observations and Remote Sensing, 8(9):4431–4441, 2015. doi:10.1109/JSTARS.2014.2325780. → pages 53J. Foster, A. Chang, and D. Hall. Comparison of snow mass estimates from aprototype passive microwave snow algorithm, a revised algorithm and a snow104depth climatology. Remote Sensing of Environment, 62(2):132–142, 1997.doi:10.1016/S0034-4257(97)00085-0. → pages 43M. A. Friedl, D. Sulla-Menashe, B. Tan, A. Schneider, N. Ramankutty, A. Sibley,and X. Huang. MODIS Collection 5 global land cover: Algorithm refinementsand characterization of new datasets. Remote Sensing of Environment, 114(1):168–182, 2010. → pages 77D. R. Fuka, M. T. Walter, J. A. Archibald, T. S. Steenhuis, and Z. M. Easton.EcoHydRology: A community modeling foundation for Eco-Hydrology. Rpackage version 0.4.12, 2014. → pages 80T. Y. Gan. Passive microwave snow research in the Canadian high Arctic.Canadian Journal of Remote Sensing, 22(1):36–44, 1996.doi:10.1080/07038992.1996.10874635. → pages 44, 69I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. AdaptiveComputation and Machine Learning series. MIT Press, 2016. → pages 5R. S. Govindaraju and A. R. Rao. Artificial Neural Networks in Hydrology.Springer Science & Business Media, 2000. → pages 2J. Guo, L. Tsang, E. G. Josberger, A. W. Wood, J.-N. Hwang, and D. P.Lettenmaier. Mapping the spatial distribution and time evolution of snow waterequivalent with passive microwave measurements. IEEE Transactions onGeoscience and Remote Sensing, 41(3):612–621, 2003.doi:10.1109/TGRS.2003.808907. → pages 43, 69D. A. Hastings, P. K. Dunbar, G. M. Elphingstone, M. Bootz, H. Murakami,H. Maruyama, H. Masaharu, P. Holland, J. Payne, N. A. Bryant, T. L. Logan,J.-P. Muller, G. Schreier, and J. S. MacDonald. The Global Land One-kilometerBase Elevation (GLOBE) Digital Elevation Model, version 1.0. Technicalreport, National Oceanic and Atmospheric Administration, 325 Broadway,Boulder, Colorado 80305-3328, U.S.A., 1999. → pages xviii, 8, 14, 44, 51, 79,91M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linearsystems. Journal of Research of the National Bureau of Standards, 49(6), 1952.→ pages 3R. Hock. A distributed temperature-index ice- and snowmelt model includingpotential direct solar radiation. Journal of Glaciology, 45(149):101–111, 1999.→ pages 71105S. S. Holland. Landforms of British Columbia, A Physiographic Outline. NumberBulletin 48, British Columbia Ministry of Energy, Mines and PetroleumResources. Printer to the Queen, 1964. → pages xviii, 45, 74, 79, 91R. F. Hopkinson, D. W. McKenney, E. J. Milewska, M. F. Hutchinson,P. Papadopol, and L. A. Vincent. Impact of aligning climatological day ongridding daily maximum–minimum temperature and precipitation over Canada.Journal of Applied Meteorology and Climatology, 50(8):1654–1665, 2011. →pages 8, 80W. W. Hsieh. Machine Learning Methods in the Environmental Sciences: NeuralNetworks and Kernels. Cambridge University Press, 2009. → pages xi, 2, 4,43, 48, 51, 83W. W. Hsieh and B. Tang. Interannual variability of accumulated snow in theColumbia basin, British Columbia. Water Resources Research, 37(6):1753–1759, 2001. → pages 1K.-L. Hsu, X. Gao, S. Sorooshian, and H. V. Gupta. Precipitation estimation fromremotely sensed information using Artificial Neural Networks. Journal ofApplied Meteorology, 36(9):1176–1190, 1997. → pages 69K.-l. Hsu, H. V. Gupta, X. Gao, and S. Sorooshian. Estimation of physicalvariables from multichannel remotely sensed imagery using a neural network:Application to rainfall estimation. Water Resources Research, 35(5):1605–1618, 1999. → pages 69B. Huang, V. Banzon, E. Freeman, J. Lawrimore, W. Liu, T. Peterson, T. Smith,P. Thorne, S. Woodruff, and H. Zhang. Extended Reconstructed Sea SurfaceTemperature (ERSST), Version 4. NOAA National Centers for EnvironmentalInformation, 10, 2015. → pages 97G. J. Huffman, R. F. Adler, D. T. Bolvin, and G. Gu. Improving the globalprecipitation record: GPCP Version 2.1. Geophysical Research Letters, 36(17),2009. → pages 78M. E. Jensen, R. D. Burman, and R. G. Allen. Evapotranspiration and IrrigationWater Requirements. ASCE, 1990. → pages 71I. T. Jolliffe. Uncertainty and inference for verification measures. Weather andForecasting, 22(3):637–650, 2007. doi:10.1175/WAF989.1. → pages 37, 56,60, 83, 88106G. Jost, R. D. Moore, R. Smith, and D. R. Gluns. Distributed temperature-indexsnowmelt modelling for forested catchments. Journal of Hydrology, 420:87–101, 2012. → pages 71C. Kongoli, N. C. Grody, and R. R. Ferraro. Interpretation of AMSU microwavemeasurements for the retrievals of snow water equivalent and snow depth.Journal of Geophysical Research: Atmospheres, 109(D24), 2004. → pages 10,96A. Langlois, A. Royer, F. Dupont, A. Roy, K. Goı¨ta, and G. Picard. Improvedcorrections of forest effects on passive microwave satellite remote sensing ofsnow over boreal and subarctic regions. IEEE Transactions on Geoscience andRemote Sensing, 49(10):3824–3837, 2011. → pages 11X. Liang, D. P. Lettenmaier, E. F. Wood, and S. J. Burges. A simplehydrologically based model of land surface water and energy fluxes for generalcirculation models. Journal of Geophysical Research, 99(D7):14415–14428,1994. doi:10.1029/94JD00483. → pages 54A. R. Lima, A. J. Cannon, and W. W. Hsieh. Nonlinear regression inenvironmental sciences by support vector machines combined withevolutionary strategy. Computers & Geosciences, 50:136–144, 2013. → pages96A. R. Lima, A. J. Cannon, and W. W. Hsieh. Nonlinear regression inenvironmental sciences using extreme learning machines: a comparativeevaluation. Environmental Modelling & Software, 73:175–188, 2015.doi:10.1016/j.envsoft.2015.08.002. → pages 53D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 2ndedition, 1984. → pages 3K. Luojus, J. Pulliainen, M. Takala, C. Derksen, H. Rott, T. Nagler, R. Solberg,A. Wiesmann, S. Metsamaki, E. Malnes, et al. Investigating the feasibility ofthe GlobSnow snow water equivalent data for climate research purposes. In2010 IEEE International Geoscience and Remote Sensing Symposium(IGARSS), pages 4851–4853. IEEE, 2010. → pages 24D. J. MacKay. Bayesian interpolation. Neural Computation, 4(3):415–447, 1992.→ pages 95W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent innervous activity. The Bulletin of Mathematical Biophysics, 5(4):115–133, 1943.→ pages 2107D. W. McKenney, M. F. Hutchinson, P. Papadopol, K. Lawrence, J. Pedlar,K. Campbell, E. Milewska, R. F. Hopkinson, D. Price, and T. Owen.Customized spatial climate models for North America. Bulletin of theAmerican Meteorological Society, 92(12):1611–1622, 2011. → pages 8, 80R. Moore and I. McKendry. Spring snowpack anomaly patterns and winterclimatic variability, British Columbia, Canada. Water Resources Research, 32(3):623–632, 1996. → pages 1L. Mudryk, C. Derksen, P. Kushner, and R. Brown. Characterization of northernhemisphere snow water equivalent datasets, 1981–2010. Journal of Climate, 28(20):8037–8051, 2015. doi:10.1175/JCLI-D-15-0229.1. → pages 12, 13, 40,42, 44, 68, 72K. P. Murphy. Machine Learning: A Probabilistic Perspective. AdaptiveComputation and Machine Learning series. MIT Press, 2012. → pages 2J. C. Nash. Compact Numerical Methods for Computers: Linear Algebra andFunction Minimisation. Hilger, Bristol, 1979. → pages 48L. S. Ndlovu. Weather data generation and its use in estimatingevapotranspiration. PhD thesis, Washington State University, 1994. → pages71N. J. Nilsson. The quest for Artificial Intelligence. Cambridge University Press,2009. → pages 2Pacific Climate Impacts Consortium. Gridded Hydrologic Model Output.University of Victoria, January 2014. Downloaded fromhttp://tools.pacificclimate.org/dataportal/hydro model out/map/ on March 242015. → pages 8, 54, 63, 73, 74M. Pan, J. Sheffield, E. F. Wood, K. E. Mitchell, P. R. Houser, J. C. Schaake,A. Robock, D. Lohmann, B. Cosgrove, Q. Duan, L. Luo, R. W. Higgins, R. T.Pinker, and J. D. Tarpley. Snow process modeling in the North American LandData Assimilation System (NLDAS): 2. Evaluation of model simulated snowwater equivalent. Journal of Geophysical Research: Atmospheres, 108(D22),2003. → pages 72T. M. Pavelsky, S. Kapnick, and A. Hall. Accumulation and melt dynamics ofsnowpack from a multiresolution regional climate model in the central SierraNevada, California. Journal of Geophysical Research: Atmospheres, 116(D16),2011. → pages 35108L. Prechelt. Automatic early stopping using cross validation: quantifying thecriteria. Neural Networks, 11(4):761–767, 1998. → pages 83J. Pulliainen. Mapping of snow water equivalent and snow depth in boreal andsub-arctic zones by assimilating space-borne microwave radiometer data andground-based observations. Remote Sensing of Environment, 101(2):257–269,2006. → pages 12, 24J. T. Pulliainen, J. Grandell, and M. T. Hallikainen. HUT snow emission modeland its applicability to snow water equivalent retrieval. IEEE Transactions onGeoscience and Remote Sensing, 37(3):1378–1390, 1999.doi:10.1109/36.763302. → pages 7, 8, 43, 69R Core Team. R: A Language and Environment for Statistical Computing. RFoundation for Statistical Computing, Vienna, Austria, 2015. → pages 79A. Rango and J. Martinec. Revisiting the degree-day method for snowmeltcomputations. JAWRA Journal of the American Water Resources Association,31(4):657–669, 1995. → pages 71R. Reichle. The MERRA-Land Data Product. Global Modeling and AssimilationOffice, 2012. Note No. 3 (Version 1.2), 38 pp, retrieved fromhttps://gmao.gsfc.nasa.gov/pubs/docs/Reichle541.pdf on July 1, 2016. →pages 24R. H. Reichle, R. D. Koster, G. J. De Lannoy, B. A. Forman, Q. Liu, S. P.Mahanama, and A. Toure´. Assessment and enhancement of MERRA landsurface hydrology estimates. Journal of Climate, 24(24):6322–6338, 2011.doi:10.1175/JCLI-D-10-05033.1. → pages 7, 8, 12, 23, 24, 47T. Reichler and J. Kim. Uncertainties in the climate mean state of globalobservations, reanalyses, and the GFDL climate model. Journal of GeophysicalResearch: Atmospheres, 113(D5):D05106, 2008. ISSN 2156-2202.doi:10.1029/2007JD009278. → pages 42, 68C. A. Reynolds, T. J. Jackson, and W. J. Rawls. Estimating available watercontent by linking the FAO soil map of the world with global soil profiledatabases and pedo-transfer functions. In Proceedings of the AGU 1999 SpringConference, Boston, MA, 1999. → pages 77M. M. Rienecker, M. J. Suarez, R. Todling, J. Bacmeister, L. Takacs, H.-C. Liu,W. Gu, M. Sienkiewicz, R. D. Koster, R. Gelaro, I. Stajner, and J. E. Nielsen.The GEOS-5 data assimilation system - documentation of versions 5.0.1, 5.1.0,109and 5.2.0. Technical Report Series on Global Modeling and Data AssimilationVol. 27, 2008. → pages 77M. M. Rienecker, M. J. Suarez, R. Gelaro, R. Todling, J. Bacmeister, E. Liu,M. G. Bosilovich, S. D. Schubert, L. Takacs, G. K. Kim, S. Bloom, J. Y. Chen,D. Collins, A. Conaty, A. Da Silva, W. Gu, J. Joiner, R. D. Koster, R. Lucchesi,A. Molod, T. Owens, S. Pawson, P. Pegion, C. R. Redder, R. Reichle, F. R.Robertson, A. G. Ruddick, M. Sienkiewicz, and J. Woollen. MERRA: NASA’smodern-era retrospective analysis for research and applications. Journal ofClimate, 24(14):3624–3648, 2011. doi:10.1175/JCLI-D-11-00015.1. → pages6, 8, 12, 23, 47, 77S. J. Riley, S. D. DeGloria, and R. Elliot. A terrain ruggedness index thatquantifies topographic heterogeneity. Intermountain Journal of Sciences, 5(1-4):23–27, 1999. → pages 14, 73, 80, 93M. Rodell, P. Houser, U. Jambor, J. Gottschalck, K. Mitchell, C.-J. Meng,K. Arsenault, B. Cosgrove, J. Radakovich, M. Bosilovich, J. Entin, J. Walker,D. Lohmann, and D. Toll. The Global Land Data Assimilation System. Bulletinof the American Meteorological Society, 85(3):381–394, 2004.doi:10.1175/BAMS-85-3-381. → pages 7, 8, 11, 21, 47, 77F. Rosenblatt. The perceptron: A probabilistic model for information storage andorganization in the brain. Psychological Review, 65(6):386, 1958. → pages 2F. Rosenblatt. Principles of Neurodynamics. Spartan Books, 1962. → pages 2J. Rowland and R. Moore. Modelling solar irradiance on sloping surfaces underleafless deciduous forests. Agricultural and Forest Meteorology, 60(1-2):111–132, 1992. → pages 71D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internalrepresentations by error propagation. In Parallel Distributed Processing. MITPress, 1986. → pages 3M. Schnorbus, K. Bennett, and A. Werner. Quantifying the water resourceimpacts of mountain pine beetle and associated salvage harvest operationsacross a range of watershed scales: Hydrologic modeling of the Fraser RiverBasin. Technical Report BC-X-423, Pacific Forestry Centre, 2010. → pages 54M. Schnorbus, K. Bennett, A. Werner, and A. Berland. Hydrologic impacts ofclimate change in the Peace, Campbell and Columbia watersheds, BritishColumbia, Canada. Technical Report 157, Pacific Climate Impacts Consortium,University of Victoria, Victoria, BC, 2011. → pages 73110M. Schnorbus, A. Werner, and K. Bennett. Impacts of climate change in threehydrologic regimes in British Columbia, Canada. Hydrological Processes, 28(3):1170–1189, 2014. doi:10.1002/hyp.9661. → pages 54J. Sheffield, G. Goteti, and E. F. Wood. Development of a 50-year high-resolutionglobal dataset of meteorological forcings for land surface modeling. Journal ofClimate, 19(13):3088–3111, 2006. doi:10.1175/JCLI3790.1. → pages 7, 11,47, 77J. Shi, C. Xiong, and L. Jiang. Review of snow water equivalent microwaveremote sensing. Science China Earth Sciences, 59(4):731–745, 2016.doi:10.1007/s11430-015-5225-0. → pages 44, 70C. A. Smith, P. D. Sardeshmukh, et al. The effect of ENSO on the intraseasonalvariance of surface temperatures in winter. International Journal ofClimatology, 20(13):1543–1557, 2000. → pages 97A. M. Snauffer, W. W. Hsieh, and A. J. Cannon. Comparison of gridded snowwater equivalent products with in situ measurements in British Columbia,Canada. Journal of Hydrology, 541:714–726, 2016.doi:10.1016/j.jhydrol.2016.07.027. → pages vA. M. Snauffer, W. W. Hsieh, A. J. Cannon, and M. A. Schnorbus. Improvinggridded snow water equivalent products in British Columbia, Canada:multi-source data fusion by neural network models. The Cryosphere, 2017.doi:10.5194/tc-2017-56. in press. → pages v, 55, 56K. Stahl, R. Moore, J. Floyer, M. Asplin, and I. McKendry. Comparison ofapproaches for spatial interpolation of daily air temperature in a large regionwith complex topography and highly variable station density. Agricultural andForest Meteorology, 139(3):224–236, 2006a. → pages 71K. Stahl, R. D. Moore, and I. G. McKendry. The role of synoptic-scale circulationin the linkage between large-scale ocean–atmosphere indices and wintersurface climate in British Columbia, Canada. International Journal ofClimatology, 26(4):541–560, 2006b. → pages 1M. Sturm, J. Holmgren, and G. E. Liston. A seasonal snow cover classificationsystem for local to global applications. Journal of Climate, 8(5):1261–1283,1995. → pages 25M. Sturm, B. Taras, G. E. Liston, C. Derksen, T. Jonas, and J. Lea. Estimatingsnow water equivalent using snow depth data and climate classes. Journal ofHydrometeorology, 11(6):1380–1394, 2010. → pages 10111R. Sultana, K.-L. Hsu, J. Li, and S. Sorooshian. Evaluating the Utah EnergyBalance (UEB) snow model in the Noah land-surface model. Hydrology andEarth System Sciences, 18(9):3553, 2014. → pages 70, 72A. Tait. Estimation of snow water equivalent using passive microwave radiationdata. Remote Sensing of Environment, 64(3):286–291, 1998.doi:10.1016/S0034-4257(98)00005-4. → pages 43M. Takala, K. Luojus, J. Pulliainen, C. Derksen, J. Lemmetyinen, J.-P. Ka¨rna¨,J. Koskinen, and B. Bojkov. Estimating northern hemisphere snow waterequivalent for climate research through assimilation of space-borne radiometerdata and ground-based measurements. Remote Sensing of Environment, 115(12):3517–3529, 2011. doi:10.1016/j.rse.2011.08.014. → pages 24, 47M. Tedesco, J. Pulliainen, M. Takala, M. Hallikainen, and P. Pampaloni. Artificialneural network-based techniques for the retrieval of SWE and snow depth fromSSM/I data. Remote Sensing of Environment, 90(1):76–85, 2004.doi:10.1016/j.rse.2003.12.002. → pages 43, 69G. Thimm and E. Fiesler. High-order and multilayer perceptron initialization.IEEE Transactions on Neural Networks, 8(2):349–359, 1997.doi:10.1109/72.557673. → pages 48, 83J. Tong, S. J. Dery, P. L. Jackson, and C. Derksen. Testing snow water equivalentretrieval algorithms for passive microwave remote sensing in an alpinewatershed of western Canada. Canadian Journal of Remote Sensing, 36(sup1):S74–S86, 2010. doi:10.5589/m10-009. → pages 10, 39, 44, 69US Army Corps of Engineers. Runoff from snowmelt, 1960. US GovernmentPrinting. → pages 71J. VanDerWal, L. Falconi, S. Januchowski, L. Shoo, and C. Storlie. SDMTools:Species Distribution Modelling Tools: Tools for processing data associatedwith species distribution modelling exercises. R package version 1.1-221, 2014.→ pages 79A. Varhola, N. C. Coops, M. Weiler, and R. D. Moore. Forest canopy effects onsnow accumulation and ablation: An integrative review of empirical results.Journal of Hydrology, 392(3):219–233, 2010. → pages 11, 70M. T. Walter. Winter-time hydrologic modeling over a three-dimensionallandscape. 1995. → pages 70112M. T. Walter, E. S. Brooks, D. K. McCool, L. G. King, M. Molnau, and J. Boll.Process-based snowmelt modeling: does it require more input data thantemperature-index modeling? Journal of Hydrology, 300(1):65–75, 2005. →pages 8, 71, 80, 93T. Wang, A. Hamann, D. L. Spittlehouse, and T. Q. Murdock.ClimateWNA–high-resolution spatial climate data for western North America.Journal of Applied Meteorology and Climatology, 51(1):16–29, 2012. → pages8, 80F. J. Wentz. SSM/I version-7 calibration report. Remote Sensing Systems Tech.Rep, 11012:46, 2013. → pages 43P. J. Werbos. Beyond regression: New tools for prediction and analysis in thebehavioral sciences”. PhD thesis, Harvard University, 1974. → pages 3B. Widrow and M. Hoff. Adaptive switching circuits. In 1960 IRE WESCONConvention Record, pages 96–104. IRE, 1960. → pages 2K. Wolter and M. S. Timlin. El Nin˜o/Southern Oscillation behaviour since 1871as diagnosed in an extended multivariate ENSO index (MEI.ext). InternationalJournal of Climatology, 31(7):1074–1087, 2011. → pages 97A. W. Wood and D. P. Lettenmaier. A test bed for new seasonal hydrologicforecasting approaches in the western United States. Bulletin of the AmericanMeteorological Society, 87(12):1699–1712, 2006. → pages 10M. L. Wrzesien, M. T. Durand, T. M. Pavelsky, I. M. Howat, S. A. Margulis, andL. S. Huning. Comparison of methods to estimate snow water equivalent at themountain range scale: A case study of the california sierra nevada. Journal ofHydrometeorology, 18(4):1101–1119, 2017. → pages 72Y. Xue and B. A. Forman. Comparison of passive microwave brightnesstemperature prediction sensitivities over snow-covered land in North Americausing machine learning algorithms and the Advanced Microwave ScanningRadiometer. Remote Sensing of Environment, 170:153–165, 2015.doi:10.1016/j.rse.2015.09.009. → pages 53113


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items