UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Air quality prediction by machine learning methods Peng, Huiping 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2015_november_peng_huiping.pdf [ 9.78MB ]
JSON: 24-1.0166787.json
JSON-LD: 24-1.0166787-ld.json
RDF/XML (Pretty): 24-1.0166787-rdf.xml
RDF/JSON: 24-1.0166787-rdf.json
Turtle: 24-1.0166787-turtle.txt
N-Triples: 24-1.0166787-rdf-ntriples.txt
Original Record: 24-1.0166787-source.json
Full Text

Full Text

Air Quality Prediction by MachineLearning MethodsbyHuiping PengB.Sc., Sun Yat-sen University, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Atmospheric Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)October 2015c© Huiping Peng 2015AbstractAs air pollution is a complex mixture of toxic components with considerable impacton humans, forecasting air pollution concentration emerges as a priority for improvinglife quality. In this study, air quality data (observational and numerical) were used toproduce hourly spot concentration forecasts of ozone (O3), particulate matter 2.5µm(PM2.5) and nitrogen dioxide (NO2), up to 48 hours for six stations across Canada– Vancouver, Edmonton, Winnipeg, Toronto, Montreal and Halifax. Using numericaldata from an air quality model (GEM-MACH15) as predictors, forecast models forpollutant concentrations were built using multiple linear regression (MLR) and multi-layer perceptron neural networks (MLP NN). A relatively new method, the extremelearning machine (ELM), was also used to overcome the limitation of linear methodsas well as the large computational demand of MLP NN. In operational forecasting, thecontinuous arrival of new data means frequent updating of the models is needed. Thistype of learning, called online sequential learning, is straightforward for MLR and ELMbut not for MLP NN. Forecast performance of the online sequential MLR (OSMLR) andonline sequential ELM (OSELM), together with stepwise MLR, all updated daily werecompared with MLP NN updated seasonally, and the benchmark, updatable modeloutput statistics (UMOS) from Environmental Canada. Overall OSELM tended toslightly outperform the other models including UMOS, being most successful with ozoneforecasts and least with PM2.5 forecasts. MLP NN updated seasonally was generallyunderperforming the linear models MLR and OSMLR, indicating the need to updatea nonlinear model frequently.iiPrefaceThis thesis contains research conducted by the candidate, Huiping Peng, under thesupervision of Dr. William Hsieh, Dr. Alex Cannon. The air quality and meteorolog-ical data sets used in this study were provided by Dr. Andrew Teakles (EnvironmentCanada). Fig 3.1 was reproduced using the station data from Environment Canada.The ELM and MLP NN model in this thesis were based on R packages developed byAranildo Lima and Alex Cannon. The supervisory committee provided the original re-search topic, direction and critical feedback on the research methods. The developmentof statistical air quality models and the analysis of results were primarily the work of thecandidate, but William Hsieh, Alex Cannon and Andrew Teakles contributed substan-tially by suggesting specialized analysis techniques, by helping to interpret the resultsand by carefully editing the manuscript. Currently no part of this thesis has beenpublished, but a paper based on this thesis is undergoing preparation for submission.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Extreme Learning Machine (ELM) . . . . . . . . . . . . . . . . 92.3 Air Quality Forecasting Models . . . . . . . . . . . . . . . . . . . . . . 112.3.1 Dispersion Models . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 Photochemical Models . . . . . . . . . . . . . . . . . . . . . . . 132.3.3 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3.4 Neural Network Models . . . . . . . . . . . . . . . . . . . . . . . 162.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17ivTable of Contents3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Methods and Models Set up . . . . . . . . . . . . . . . . . . . . . . . . . 224.1 Input Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Multiple Linear Regression (MLR) . . . . . . . . . . . . . . . . . . . . . 234.3 Online-Sequential Multiple Linear Regression (OS-MLR) . . . . . . . . 244.4 Multi-layer Perceptron Neural Network (MLP NN) . . . . . . . . . . . 264.5 Extreme Learning Machine (ELM) . . . . . . . . . . . . . . . . . . . . . 274.6 Online-Sequential Extreme Learning Machine (OS-ELM) . . . . . . . . 304.7 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.7.1 Pearson Correlation Coefficient (r) . . . . . . . . . . . . . . . . 314.7.2 Mean Absolute Error (MAE) . . . . . . . . . . . . . . . . . . . . 314.7.3 MAE/MAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.7.4 Root Mean Square Error (RMSE) . . . . . . . . . . . . . . . . . 324.7.5 Skill Score (SS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.1 Ozone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 PM2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.3 NO2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.4 Model Results with Antecedent Predictors . . . . . . . . . . . . . . . . 746 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87vList of Tables3.1 Statistical properties of ozone concentration in 6 stations. . . . . . . . . 215.1 Statistical properties of ozone concentration (ppb) by station and season. 435.2 Statistical properties of top 10th percentile ozone concentration by sta-tion and season. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.3 Statistical properties of PM2.5 concentration (µg/m3) by station andseason. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.4 Statistical properties of top 10th percentile PM2.5 concentration (µg/m3)by station and season. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.5 Statistical properties of NO2 concentration (ppb) by station and season. 685.6 Statistical properties of top 10th percentile NO2 concentration (ppb) bystation and season. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71viList of Figures2.1 The general structure of a MLP NN model (Hsieh, 2009). . . . . . . . . 72.2 A diagram illustrating the problem of over-fitting. The dash curve showsa good fit to noisy data (squares), while the solid curve illustrate over-fitting, where the fit is perfect on the training data (squares), but is pooron the test data (circles) (Hsieh and Tang, 1998). . . . . . . . . . . . . . 93.1 Geographical distribution of the UMOS-AQ stations (red dots), with thesix stations selected in this study shown as blue triangles. . . . . . . . . 195.1 Boxplot of the observed ozone values and the predicted values from fivemethods over all forecast lead times (1 - 48hr) at six stations. . . . . . . 355.2 Ozone forecast scores of different methods averaged over all forecast leadtimes (1 - 48hr) at the six stations. . . . . . . . . . . . . . . . . . . . . . 365.3 Forecast correlation score as a function of the forecast lead time (1-48hr)from the five models displayed in a heat map (bottom panels) for fore-casts initiated at 00 UTC (left) and 12 UTC (right) at six stations. Blackvertical stripes indicate “missing values”, i.e. fewer than 100 data pointswere available for model training during 2009/07-2011/07. The mean di-urnal ozone cycle is displayed in the top panels. 00 UTC correspondsto local time (daylight saving time) of 5 pm, 6 pm, 7 pm, 8 pm, 8 pmand 9 pm at Vancouver, Edmonton, Winnipeg, Toronto, Montreal andHalifax, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.4 Ozone MAE skill score of different models by forecast hour, with fore-casts initiated at 00 UTC (left column) and 12 UTC (right column).The panels are arranged in six rows, from Vancouver (top) to Halifax(bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5 Boxplot of the ozone residuals (prediction−observation) by season andstation. Outliers are not plotted but can be seen in Fig 5.7. . . . . . . . 43viiList of Figures5.6 Ozone forecast scores (MAE, RMSE, MAE/MAD and r) by season andstation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.7 Boxplot of ozone residuals (prediction−observation) (over all leadtimes)from the top 10th percentile by season and station. . . . . . . . . . . . . 465.8 Ozone forecast scores of top 10th percentile by season and station. . . . 475.9 Boxplot of the PM2.5 observations and predictions by different methodsat six stations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.10 PM2.5 forecast scores of different methods at the six test stations. . . . . 495.11 Mean diurnal PM2.5 concentration and heat map of the correlation scoreby model and station for forecast lead time 1-48hr and forecasts initiatedat 00 UTC (left) and 12 UTC (right). . . . . . . . . . . . . . . . . . . . 535.12 PM2.5 MAE skill score of different models by forecast hour for the sixstations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.13 Boxplot of PM2.5 residuals (prediction−observation) by season and sta-tion. Outliers are not plotted but can be seen in Fig 5.15. . . . . . . . . 565.14 PM2.5 forecast scores by season and station. . . . . . . . . . . . . . . . . 575.15 Boxplot of PM2.5 residuals (prediction−observation) (over all lead times)from the top 10th percentile by season and station. . . . . . . . . . . . . 595.16 PM2.5 forecast scores of top 10th percentile by season and station. . . . 605.17 Boxplot of the observed NO2 values and the predicted values from fivemodels (over all forecast lead times ) at six stations. . . . . . . . . . . . 615.18 NO2 forecast scores of different methods in the six stations. . . . . . . . 625.19 Mean diurnal NO2 concentration and heat map of the correlation scoreby model and station. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.20 NO2 MAE skill score of different models by forecast hour for the sixstations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.21 Boxplot of NO2 residuals (prediction−observation) by season and sta-tion. Outiers are not plotted but can be seen in Fig 5.23. . . . . . . . . 695.22 NO2 forecast scores by season and station. . . . . . . . . . . . . . . . . . 705.23 Boxplot of NO2 residuals (prediction−observation) (over all lead times)from the top 10th percentile by season and station. . . . . . . . . . . . . 725.24 NO2 forecast scores of top 10th percentile by season and station. . . . . 735.25 Ozone forecast scores from models with and without antecedent predic-tors in the six stations. Models with antecedent predictors (OSELM-Aand OSMLR-A) are in red, the original models without the extra pre-dictors are in blue, and UMOS is in black. . . . . . . . . . . . . . . . . . 75viiiList of Figures5.26 Ozone MAE skill score from models with and without antecedent pre-dictors by forecast hour. . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.27 Ozone top 10th percentile forecast scores from models with and withoutantecedent predictors by season and station. . . . . . . . . . . . . . . . . 775.28 PM2.5 forecast scores from models with and without antecedent predic-tors in the six stations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.29 PM2.5 MAE skill score from models with and without antecedent pre-dictors by forecast hour. . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.30 PM2.5 top 10th percentile forecast scores from models with and withoutantecedent predictors by season and station. . . . . . . . . . . . . . . . . 805.31 NO2 forecast scores from models with and without antecedent predictorsin the six stations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.32 NO2 MAE skill score from models with and without antecedent predic-tors by forecast hour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.33 NO2 top 10th percentile forecast scores from models with and withoutantecedent predictors by season and station. . . . . . . . . . . . . . . . . 83ixList of AbbreviationsAQHI Air quality health indexELM Extreme Learning MachineGEM-MACH15 Global environmental multi-scale model - modeling air quality andchemistry with 15-km grid spacingLST Local sidereal timeMAD Mean absolute deviationMAE Mean absolute errorMLP NN Multilayer perceptron neural networkMLR Multiple linear regressionNO2 Nitrogen dioxideO3 OzoneOSELM Online sequential extreme learning machineOSMLR Online sequential multiple linear regressionPM2.5 Particulate matter 2.5 µmr Pearson correlation coefficientRMSE Root mean square errorSS Skill scoreUMOS Updatable model output statisticsUTC Coordinated universal timexAcknowledgementsI would like to thank Prof. William Hsieh for his never-ending guidance and support.The time that was put into this research could not have been done without his co-ordination and exceptional knowledge of Atmospheric Science and machine learningmethods. His constant support and considerate attitude allowed the completion of thisresearch without any pressure or tension. I would also like to thank Dr. Alex Cannonfor his commitment to learning and advisement for this research. I would also like toacknowledge and thank the members of the UBC climate prediction group: AranildoRodrigues, Jian Jin and Andrew Snauffer. Not only were they my fellow students whohelped and supported, but they were also my friends. I would like to thank AndrewTeakles and Jonathan Baik, for providing data and detailed instruction in each phaseof the research, for approving my research and inspiring me through their excellent ex-perience with Environment Canada. I would like to thank Prof. Roland Stull and Prof.Susan Allen, who taught me courses, and the knowledge gained was able to contributeto my research. Lastly I would like to thank my parents and all of my friends for theirlove and encouragement, they are always my best support no matter where I am.xiChapter 1IntroductionWith economic development and population rise in cities, environmental pollution prob-lems involving air pollution, water pollution, noise and the shortage of land resourceshave attracted increasing attention. Among these, air pollution’s direct impact on hu-man health through exposure to pollutants has resulted in an increased public awarenessin both developing and developed countries (Kim et al., 2013; Kurt and Oktay, 2010;McGranahan and Murray, 2003). Air pollution is usually caused by energy produc-tion from power plants, industries, residential heating, fuel burning vehicles, naturaldisasters, etc. Human health concern is one of the important consequences of air pol-lution, especially in urban areas. The global warming from anthropogenic greenhousegas emissions is a long-term consequence of air pollution (Nordiska, 2008; Ramanathanand Feng, 2009; Kumar and Goyal, 2013). Accurate air quality forecasting can reducethe effect of a pollution peak on the surrounding population and ecosystem, hence im-proving air quality forecasting is an important goal for society.1.1 BackgroundAir pollution is the introduction of particulates, biological molecules, or other harmfulmaterials into the Earth’s atmosphere, causing disease, death to humans, damage toother living organisms such as food crops, or damage to the natural or man-made en-vironment. An air pollutant is a substance in the air that can have adverse effects onhumans and the ecosystem. The substance can be solid particles, liquid droplets, orgases. Pollutants are classified as primary or secondary. Primary pollutants are usu-ally produced from a process, such as ash from a volcanic eruption. Other examplesinclude carbon monoxide gas from motor vehicle exhaust, or sulfur dioxide releasedfrom factories. Secondary pollutants are not emitted directly. Rather, they form inthe air when primary pollutants react or interact. Ground level ozone is a prominentexample of a secondary pollutant. The six “criteria pollutants” are ground level ozone(O3), fine particulate matter (PM2.5), carbon monoxide (CO), nitrogen dioxide (NO2),11.1. Backgroundsulfur dioxide (SO2), and lead, among which ground level O3, PM2.5 and NO2 (maincomponent of NOx) are the most widespread health threats.Ground level O3, a gaseous secondary air pollutant formed by complex chemicalreactions between NOx and volatile organic compounds (VOCs) in the atmosphere,can have significant negative impacts on human health (Chen et al., 2007; Brauer andBrook, 1997). Prolonged exposure to O3 concentrations over a certain level may causepermanent lung damage, aggravated asthma, or other respiratory illnesses. Groundlevel O3 can also have detrimental effects on plants and ecosystems, including damageto plants, reductions of crop yield, and increase of vegetation vulnerability to disease(EPA, 2005).Particle pollution (also called particulate matter or PM) is the term for a mixtureof solid particles and liquid droplets found in the air. Some particles, such as dust,dirt, soot, or smoke, are large or dark enough to be seen with the naked eye. Othersare so small they can only be detected using an electron microscope. Fine particulatematter (PM2.5) consisting of particles with diameter 2.5 µm or smaller, is an importantpollutant among the criteria pollutants. The microscopic particles in PM2.5 can pene-trate deeply into the lungs and cause health problems, including the decrease of lungfunction, development of chronic bronchitis and nonfatal heart attacks. Fine particlescan be carried over long distances by wind and then deposited on ground or waterthrough dry or wet deposition. The wet deposition is often acidic, as fine particlescontaining sulfuric acid contribute to rain acidity, or acid rain. The effects of acid raininclude changing the nutrient balance in water and soil, damaging sensitive forests andfarm crops, and affecting the diversity of ecosystems. PM2.5 pollution is also the maincause of reduced visibility (haze) (EPA, 2005).Nitrogen dioxide (NO2) is one of a group of highly reactive gases known as “nitro-gen oxides” (NOx). US Environmental Protection Agency (EPA) Ambient Air QualityStandard uses NO2 as the indicator for the larger group of nitrogen oxides. NO2 formsquickly from emissions of automobiles, power plants, and off-road equipment. In addi-tion to contributing to the formation of ground-level ozone, and fine particle pollution,current scientific evidence links short-term NO2 exposures, ranging from 30 minutesto 24 hours, with adverse respiratory effects including airway inflammation in healthypeople and increased respiratory symptoms in people with asthma (EPA, 2005).21.1. BackgroundThe Air Quality Health Index (AQHI) is a public information tool designed inCanada to help understand the impact of air quality on health. Basically, the AQHIis defined as an index or rating scale range from 1 to 10+ based on mortality studyto indicate the level of health risk associated with local air quality (Chen and Copes,2013). The higher the number, the greater the health risk and the need to take pre-cautions. The formulation of Canadian national AQHI is based on three-hour averageconcentrations of ground-level ozone (O3), nitrogen dioxide (NO2), and fine particulatematter (PM2.5). The AQHI is calculated on a community basis, each community mayhave one or more monitoring stations and the average concentration of 3 substancesis calculated at each station within a community for the 3 preceding hours. AQHI isa meaningful index protecting residents on a daily basis from the negative effects ofair pollution. Our study gives direction to predicting individual pollutants of one houraverage concentration instead of AQHI (or its maximum) as the formulation of AQHIis based on health related science and may evolve over time. Building a forecast systembased on individual pollutants and one hour average concentration will make it moreflexible to future changes in health indices. Our result can also be beneficial to externalclients and meteorologists.The concentration of air pollutants including ground level ozone, PM2.5 and NO2varies depending on meteorological factors, the source of pollutants and the local to-pography (Dominick et al., 2012). Among these three factors, the one which moststrongly influences variations in the ambient concentration of air pollutants is mete-orological factors (Banerjee and Srivastava, 2009). Meteorological factors experiencecomplex interactions between various processes such as emissions, transportation andchemical transformation, as well as wet and dry depositions (Seinfeld and Pandis, 1997;Demuzere et al., 2009). In addition, the spatial and temporal behavior of wind fieldsare affected by the surface roughness and differences in the thermal conditions (Okeet al., 1989; Roth, 2000), which further influence the dispersion of pollutants. Forexample, Revlett (1978) and Wolff and Lioy (1978) found that ambient ozone concen-tration not only depended on the ratio and reactivity of precursor species, but also onthe state of the atmosphere - the amount of sunlight, ambient air temperature, rela-tive humidity, wind speed, and mixed layer (ML) depth, while Tai (2012) found thatdaily variations in meteorology as described by the multiple linear regression (MLR)including nine predictor variables (temperature, relative humidity, precipitation, cloudcover, 850-hPa geopotential height, sea-level pressure tendency, wind speed and winddirection) could explain up to 50% of the daily PM2.5 variability in the US. Hence, me-31.1. Backgroundteorological factors play an important role in air pollutant concentrations, also makingthem difficult to model.Most current air quality forecasting uses straightforward approaches like box mod-els, Gaussian models and linear statistical models. Those models are easy to implementand allow for the rapid calculation of forecasts. However, they usually do not describethe interactions and non-linear relationship that control the transport and behaviourof pollutants in the atmosphere (Luecken et al., 2006). With these challenges, machinelearning methods originating from the field of artificial intelligence have become popu-lar in air quality forecasting and other atmospheric problems (Comrie, 1997; Hadjiiskiand Hopke, 2000; Reich et al., 1999; Roadknight et al., 1997; Song and Hopke, 1996).For instance, several neural network (NN) models have already been used for air qualityforecast, in particular for forecasting hourly averages (Kolehmainen et al., 2001; Perezet al., 2000) and daily maximum (Perez, 2001). Although NN have advantages overtraditional statistical methods in air quality forecasting, NN-based models still need toimprove in order to achieve good prediction performance as effectively and efficientlyas possible (Wang et al., 2003). A number of difficulties associated with NN hampertheir effectiveness in air quality forecasting. These difficulties include computationalexpense, multiple local minima during optimization, over-fitting to noise in the data,etc. Furthermore, there are no general rules to determine the optimal size of networkand learning parameters, which will greatly affect the prediction performance.Another key consideration of forecast models is their updatability when doing real-time forecasting. For a forecast model, recently observed data should be used to refinethe model. This generally follows a procedure that links the discrepancy between modelforecasts and the corresponding latest observation to all or some of the parameters inmodel. Normally there are two ways for model updating: batch learning and onlinelearning. Whenever new data are received, batch learning uses the past data togetherwith the new data and performs a retraining of the model, whereas online learningonly uses the new data to update the model. Batch learning can be computationallyexpensive in real-time forecasting as the procedure means repeatedly altering a rep-resentative set of parameters calibrated over a long historical record. Linear modelsare generally easy to update online (Wilson and Valle´e, 2002), and even with batchlearning, linear models are fast and easy to implement. As for non-linear methods,true online learning is difficult for many formulations such as the non-linear kernelmethod. Furthermore, short time (daily) update via batch learning is too expensive41.2. Research Objectivesto implement as a non-linear model tends to have more parameters to train and thetraining process is much slower compared to linear models. Consequently, there is aneed to develop non-linear updatable models for real-time forecasting. This study at-tempts to use the extreme learning machine (ELM) (Schmidt et al., 1992; Huang et al.,2006b), a non-linear machine learning algorithm using randomized neural networks, toforecast air pollutant concentrations in Canada. The ELM model has an architecturesimilar to the multi-layer perceptron (MLP) NN model, but it can be used for onlinesequential learning. ELM has been successfully used in different research areas and hasbeen found to produce good generalization performance with generally less learningtime compared with traditional gradient-based NN training algorithms (Huang et al.,2011; Lima et al., 2015).1.2 Research ObjectivesThe research goal of this study is to develop a non-linear updatable model for real-timeair quality forecasting, to potentially replace the updatable linear regression modelscurrently being used. The ultimate goal is to improve air pollution forecasting inCanada and in other countries.1.3 Organization of ThesisChapter 2 provides a literature review covering topics related to air quality forecast-ing, machine learning techniques and updatable model output statistics (UMOS), alinear online updating model from Environment Canada. Background theory on vari-ous machine learning and air quality topics will be covered. The reviewed air qualityforecasting studies as well as the modeling techniques will be discussed on how theycan be applied to this research. Chapter 3 describes the study area and the data setsused in this study. Chapter 4 outlines the methods used to conduct this research anddescribes the developed forecast models and evaluation methods. In Chapter 5 the re-sults from all developed forecast models for each pollutant are discussed in detail. Thethesis concludes with Chapter 6 where the original research objectives are addressedand recommendations for future research are made.5Chapter 2Literature Review2.1 IntroductionAir pollution is major threat to health and exerts a wide range of impacts on biologicaland economic systems. The purpose of this literature review is to justify the researchobjectives of this study in light of previous work by investigating past air quality pre-diction studies and determining where future research is needed. Literature relatedto air quality prediction and various types of machine learning methods used in thisstudy are reviewed. Machine learning theory and past applications are examined toshow why these methods are likely to perform well in air quality forecasting.2.2 Machine Learning TechniquesMachine learning is a major sub-field in computational intelligence (also called artificialintelligence). Its main objective is to use computational methods to extract informationfrom data. Machine learning has a wide spectrum of applications including handwrit-ing and speech recognition, robotics and computer games, natural language processing,brain-machine interface and so on. In the environmental sciences, machine learningmethods have been heavily used in data processing, model emulation, weather andclimate prediction, air quality forecasting, oceanographic and hydrological forecasting.(Hsieh, 2009).2.2.1 Neural NetworkNeural network (NN) methods were originally developed from investigations into humanbrain function and they are adaptive systems that change as they learn (Hsieh and Tang,1998). There are many types of NN models, the most common one is the multi-layerperceptron (MLP) NN model shown in Fig 2.162.2. Machine Learning TechniquesFigure 2.1: The general structure of a MLP NN model (Hsieh, 2009).The input variables xi are mapped to a layer of intermediate variables known as“hidden neuron” hj byhj = f(∑iwjixi + bj), (2.1)and then onto the output variables yk byyk = g(∑jβkjhj + βk0). (2.2)where f and g are “activation” functions in the hidden layer and the output layer,respectively. Normally f can be the logistic sigmoidal or hyperbolic tangent functionand g can be linear in NN models for regression. wji and βkj are weight parameters andbj and βk0 are offset parameters. Their optimal values are learned by model training(Hsieh and Tang, 1998) where the mean squared error of the model output is minimized.Numerous studies show NN models have good forecasting performance. Walteret al. (1998) used NN methods to simulate the observed global (and hemispheric) an-nual mean surface air temperature variations during 1874-1993 using anthropogenic andnatural forcing mechanisms as predictors. The two anthropogenic forcings were equiv-alent CO2 concentrations and tropospheric sulfate aerosol concentrations. The naturalforcing were volcanism, solar activity and ENSO (El Nin˜o Southern Oscillation). TheNN explained up to 83% of the observed temperature variance, significantly more thanby multiple regression analysis. Hewitson and Crane (1996) used MLP NN for precipi-72.2. Machine Learning Techniquestation forecast with predictors from the general circulation model (GCM) atmosphericdata over southern Africa and the surrounding ocean. The six leading PCs (principalcomponents) of the sea level pressure field and the seven leading PCs of the 500 hPageopotential height field from the GCM were used as inputs to the NN. Cavazos (1997)also used MLP NN to downscale GCM synoptic-scale atmospheric circulation to local1 ◦ × 1 ◦ gridded winter daily precipitation over north-eastern Mexico and found themodel was able to reproduce the phase and, to some degree, the amplitude of largerainfall events. Marzban and Stumpf (1996) trained an MLP to predict the existence oftornadoes. The approach outperformed other techniques including discriminant anal-ysis, logistic regression and a rule-based algorithm. Neural networks have also beenused to solve hydrological problems, such as the prediction of reservoir inflows, streamflow forecasting, downscaling precipitation, and prediction of water resource variables(e.g., flow, water level, nitrate, salinity and suspended sediment concentration) (Tri-pathi et al., 2006; Chen et al., 2010; Maier et al., 2010; Cannon, 2012b; Rasouli et al.,2012; Thirumalaiah and Deo, 1998). Other applications of neural network methodsinclude remote sensing and GIS related activities, air quality management (Boznaret al., 1993), adsorbent beds design (Basheer and Najjar, 1996), and hazardous wastemanagement.Researchers have shown that neural networks have salient advantages over tradi-tional statistical methods in environment forecasting problems. However, a numberof difficulties associated with NN hamper their effectiveness, efficiency and general ac-ceptability (Wang et al., 2003). One of the main challenges in developing a NN modelis how to address the problem of over-fitting. An over-fitted NN model could fit thedata very well during training, but produce poor forecast results during testing (Hsiehand Tang, 1998). Over-fitting occurs when a model fits to the noise in the data and itwill not generalize well to new data sets as shown in Fig Machine Learning TechniquesFigure 2.2: A diagram illustrating the problem of over-fitting. The dash curve showsa good fit to noisy data (squares), while the solid curve illustrate over-fitting, wherethe fit is perfect on the training data (squares), but is poor on the test data (circles)(Hsieh and Tang, 1998).Typically regularization (i.e. the use of weight penalty) is used to prevent over-fitting (Golub et al., 1979; Haber and Oldenburg, 2000). This usually requires some ofthe training data to be used as validation data to determine the optimal regularizationparameter to prevent over-fitting. Yuval (2000) introduced generalized cross-validation(GCV) to control overfitting/underfitting automatically in MLP NN and applied themethod to forecasting the tropical Pacific SST anomalies. Yuval (2001) used bootstrapresampling of the data to generate an ensemble of MLP NN models and used the en-semble spread to estimate the forecast uncertainty.Another issue is the computational expense involved during the training processin neural networks. Training the NN model to learn from the target data, we needto minimize the objective function J , defined here to be mean squared error (MSE)between the model output y and the target t. Normally the back-propagation algorithmis used to perform the training tasks, using a gradient-descent approach to reduce theMSE iteratively (Hsieh, 2009), which could be time-consuming.2.2.2 Extreme Learning Machine (ELM)The extreme learning machine (ELM) is a randomized neural network method proposedby Schmidt et al. (1992) and popularized by Huang et al. (2006b). The ELM algorithmhas the same architecture as a single-hidden layer feed-forward neural network (SLFN),but it is generally fast to train. The ELM randomly chooses the weights leading to the92.2. Machine Learning Techniqueshidden nodes or neurons (HN) and analytically determines the weights at the outputlayer by solving a linear least squares problem. The only hyper-parameter to be tunedin the ELM is the number of HN (Lima et al., 2015). Extensions to ELM includes on-line sequential ELM (OS-ELM)(Liang et al., 2006), incremental ELM (I-ELM)(Huanget al., 2006a; Huang and Chen, 2007, 2008), ELM ensembles, pruning ELM (P-ELM)(Rong et al., 2008) and error minimized ELM (EM-ELM) (Feng et al., 2009). In onlinesequential learning, new data arrive continuously and the model is repeatedly updatedwith the new data. OS-ELM is readily updated using only the new data, without theneed to retrain using the complete historical record.ELM has also been successfully used in different research areas (Huang et al., 2011).An integration of several ELMs was proposed by Sun et al. (2008) to predict the futuresales amount. Several ELM networks were connected in parallel and the average of theELMs outputs was used as the final predicted sales amount, with better generaliza-tion performance. Heeswijk et al. (2009) investigated the adaptive ensemble models ofELM on the application of one-step ahead prediction in stationary and non-stationarytime series. They found that the method worked well on stationary time series andthe adaptive ensemble model achieved a test error comparable to the best methods onthe non-stationary time series, while keeping adaptivity with low computational cost.Handoko et al. (2006) found that the ELM was as good as, if not better than, the MLPNN in terms of computing time, accuracy deviations across experiments and preventionof overfitting. Using the ELM as a mechanism for learning the stored digital elevationinformation to allow multi-resolution access in terrain models, Yeu et al. (2006) foundthat to achieve the same MSE during access, the memory needed in ELM was sig-nificantly lower than that needed by Delaunay triangulation (DT). Additionally, theoffline training time for the ELM network was much less than that for the MLP NN,DT and support vector machines (SVM).As a randomized neural network, ELM is controlled by hyper-parameters such asthe range of the random weights and the number of HN. The optimal number of HN isproblem dependent and unknown in advance. We have to ensure the network structureis balanced between generalization ability and network complexity. Low network com-plexity (i.e. too few HN) might be unable to capture the true non-linear relationship,whereas too high a network complexity might decrease model generalization ability dueto overfitting to noise in the data, and increase the model training time. In general,the number of HN is selected empirically based on model performance over indepen-102.3. Air Quality Forecasting Modelsdent validation data not used in model training. For an ELM, the optimal number ofHN found can be much greater (sometimes by orders of magnitude) than MLP usingiterative non-linear optimization, so an automatic procedure is needed to select thenumber of HN. Lima et al. (2015) used the hill climbing method to find the optimalnumber of HN and the test results on nine environmental regression problems showedthat among the non-linear models, the ELM method, with often the fastest computingtime for model training, tended to perform well in prediction skills.Random initialization of the weights is another important prerequisite for goodconvergence of NN models. A balance must exist to ensure that the activation functiondoes not remain linear nor become saturated near the asymptotic limits of -1 and 1in the case of a hyperbolic tangent function. If the range of random weight distribu-tion is too small, both activation and error signals will die out on their way throughthe network. If it is too large, the saturated activation function will block the back-propagated error signals from passing through the node (Lima et al., 2015). Normallythe range of weight interval is simply a constant, but Parviainen and Riihimaki (2013)raised questions about meaningfulness of choosing model complexity based on HN onlyand also found that using an appropriate weight range can improve ELM performance,achieving a similar effect as regularization in traditional neural network models.2.3 Air Quality Forecasting ModelsAn air quality model is a numerical tool used to describe the causal relationship be-tween emissions, meteorology, atmospheric concentration, deposition and other factors.It can give a complete deterministic description of the air quality problem (Nguyen,2014). The most commonly used air quality models include dispersion models, photo-chemical models and regression models. Various neural network models, as non-linearregression models, have also been shown to be effective in air quality forecasting. Inthis section, different models and their applications will be introduced.2.3.1 Dispersion ModelsDispersion models normally use mathematical formulations to simulate the atmosphericprocess after pollutants were emitted by a source. Data needed for dispersion models112.3. Air Quality Forecasting Modelsvary in their complexity. At a minimum, most of the models require meteorologicaldata, emissions data, and details about the facilities in question (such as stack height,gas exit velocity, etc.). Some of the more complex models require topographic infor-mation, individual chemical characteristics and land use data. The output is predictedconcentration at selected downwind receptor locations. There are different types ofdispersion models with specific requirement and special scales. The most commonlyused dispersion models are the box model, Gaussian plume model, Lagrangian model,Eulerian model, computational fluid dynamics model and Gaussian puff model. Theprocesses included in those models are building wake effects, topography, street canyon,intersections, plume rise and chemistry (Holmes and Morawska, 2006).Two of the most common models used to calculate the dispersion of vehicle emis-sions are CALINE4 (California Department of Transportation) and HIWAY2 (USEPA). Both models are based on a Gaussian plume model. Yura et al. (2007) ex-plored the range of CALINE4’s PM2.5 modeling capabilities by comparing previouslycollected PM2.5 data with CALINE4 predicted values. Two sampling sites, a subur-ban site and an urban were used for this study. Model predicted concentrations aregraphed against observed concentrations and evaluated against the criterion that 75%of the points fall within the factor-of-two prediction envelope. However, only the subur-ban site results by CALINE4 met the criterion. For urban site, several factors includingstreet canyon effects likely contributed to an inaccuracy of the emission factors used inCALINE4, and therefore, to the overall CALINE4 predictions. The study suggestedthat CALINE4 might not perform well in densely populated areas and differences intopography may be a decisive factor in determining when CALINE4 may be applicableto modeling PM2.5.Colvile et al. (2002) applied the ADMS-urban atmospheric dispersion model sys-tem to review air quality in central London in 1996-1997. The model performance wasvalidated by monitoring data and showed that model precision was 10% with 0-12%bias for the annual mean NO2 and PM10 concentrations. Wallace and Kanaroglou(2008) used the Integrated Model of urban Land-use and Transportation for Environ-mental Analysis to estimate emission and concentrations of NOx from traffic sources inthe Hamilton census metropolitan area. The results showed a prominent triangle areaof high pollution, which is defined by major roads and highways along the HamiltonHarbour during peak hour. The resulting dispersion surfaces characterized the spatialdistribution of traffic emissions and thus provide a way for assessing population expo-122.3. Air Quality Forecasting Modelssure over the Hamilton area.Although dispersion models consider many processes that affect air pollutant con-centration, they have some limitations such as the simplified treatment of turbulenceand meteorology, and cannot take into account any formation of pollutants. Even NOxand SOx, which are fundamental to determining particles and ozone concentrations, areoften only calculated using a simple exponential decay (Holmes and Morawska, 2006).Gaussian models have also been shown to consistently over predict concentrations inlow wind conditions (Benson, 1984), as Gaussian models are not designed to model thedispersion under low wind conditions.2.3.2 Photochemical ModelsPhotochemical models have become widely utilized as tools in air pollution controlstrategies. Photochemical models simulate the changes of pollutant concentrations inthe atmosphere using a set of mathematical equations characterizing the chemical andphysical processes in atmosphere. These models are applied at multiple spatial scalesfrom local, regional, national, and global (Nguyen, 2014).Photochemical models have been formulated in both the Lagrangian and Eulerianreference frames (Russell and Dennis, 2000). Eulerian models include both single boxmodels and multi-dimensional grid-based models. Box models were used early and arestill used today in studies focusing on atmospheric chemistry alone. The limitationof box models is a lack of significant physical realism such as horizontal and verti-cal transport, and spatial variation. Grid models are potentially the most powerfulphotochemical model (Hansen et al., 28; Dennis et al., 1996), but are also the mostcomputationally intensive. They solve a finite approximation by dividing the modelingregion into a large number of cells, horizontally and vertically, which interact with eachother to simulate the various processes that affect the evolution of pollutant concen-trations, including chemistry, diffusion, advection, sedimentation (for particles), anddeposition.Photochemical models have been widely used to assess the relative importance ofVOC and NOx controls in reducing ozone levels. Milford et al. (1989) used the CITmodel to show the spatial variation of ozone isopleths and found a negative response of132.3. Air Quality Forecasting Modelsozone to NOx controls in the downtown region of Los Angeles. Flemming et al. (2001)have employed the regional Eulerian model with 3 chemistry mechanisms (REM3), tooperationally forecast ozone since 1997 at the Freie University, Berlin. The model hasbeen used for making 1, 2, and 3 day advance ozone forecasts with data over Germanyfrom 1997 to 1999. The resulted correlation coefficient (r) ranged from 0.77 to 0.90.The disadvantage of this model was that it tended to underestimate the low ozone con-centrations. Wotawa et al. (1998) developed a Lagrangian photochemical box model forproviding ozone forecasts for Vienna, Austria. This model consisted of up to 8 verticaland up to 5 horizontal boxes. It simulated emission, chemical reactions, horizontaldiffusion, vertical diffusion, dry deposition, wet deposition and synoptic scale verticalexchange. Model input data included a trajectory term, which was calculated usingforecast meteorological data. The model predictions for 1995 O3 season underestimatedO3 concentrations on most days and r was greater than 0.6 for most of the study cases.2.3.3 Regression ModelsBoth linear regression and non-linear regression models have been employed for airquality forecasting. The general purpose of a linear regression model is to learn aboutthe linear relationship between several independent variables (predictors) and a depen-dent variable (predictand).Prybutok et al. (2000) built a simple linear regression model for forecasting thedaily peak O3 concentration in Houston. The final model used four meteorological andO3 precursor parameters: O3 concentration at 9:00 a.m., maximum daily temperature,average NO2 concentration between 6:00 a.m. and 9:00 a.m. and average surface windspeed between 6:00 a.m. and 9:00 a.m. The correlation coefficient r of this model was0.47. Chaloulakou et al. (1999) proposed a multiple regression model to forecast thenext day’s hourly maximum O3 concentration in Athens, Greece. The set of inputvariables consisted of eight meteorological parameters and three persistence variables,which were the hourly maximum O3 concentrations of the previous three days. Testingthis linear regression model on four separate test data sets, the mean absolute error(MAE) ranged from 19.4% to 33.0% of the corresponding average O3 concentrations.Non-linear regression models are superior to simple linear regression models becausethey capture the non-linear relationships between air pollutant and meteorological pa-142.3. Air Quality Forecasting Modelsrameters. Bloomfield et al. (1996) described a non-linear regression model to explainthe effects of meteorology on O3 in the Chicago area. The model input variablesconsisted of a seasonal term, a linear annual trend term, and twelve meteorologicalvariables. The observed ozone and meteorological data in 1981-1991 were divided intosubsets for model development and validation. The model error were within ±5 ppbabout half the time, and within ±16 ppb about 95% of the time. Bloomfield et al.(1996) demonstrated that the meteorological data accounted for at least 50% of theozone concentration variance.As the reference model in this thesis, the updatable model output statistics - airquality (UMOS-AQ) system applies multiple linear regression (MLR) to forecast airquality predictands. UMOS-AQ is a statistical post-processing system for air qualityforecasting in Canada. The current Environment Canada (EC) operational AQ fore-cast model is the GEM-MACH15 (global environmental multi-scale model - modelingair quality and chemistry with 15-km grid spacing). GEM-MACH15 runs twice dailyat 00 and 12 UTC to give 48-hour AQ forecasts (Anselmo et al., 2010). UMOS-AQis based on post-possessing the GEM-MACH15 forecasts. The UMOS post-processingpackage has been used by EC to forecast meteorological predictands such as surfacetemperature and probability of precipitation since 1995 (Wilson and Valle´e, 2002, 2003).UMOS-AQ uses the existing UMOS framework and became operational in July 2010.Three predictands are currently considered by UMOS-AQ: O3, PM2.5 and NO2. Possi-ble MLR predictors include O3, PM2.5 and NO2 hourly concentrations at a station foreach hour of the previous day (i.e., persistence) plus 84 other chemical, meteorologi-cal, and physical predictors (e.g., solar flux, sine of scaled Julian day). Two seasons(summer and winter) are considered with a transitional period of 6 weeks. A minimumof 250 observation-model pairs per season are needed to generate robust MLR equa-tions and the equations are regenerated with the latest model data every week (Moranet al., 2014). One of UMOS-AQ’s main advantages is its ability to adapt to the modelchanges, as its equations are updated four times per month. However, UMOS-AQ canonly be constructed for locations where historical AQ measurements are available (Wil-son and Valle´e, 2002; Moran et al., 2014). This becomes a limitation because most AQstations are not co-located with public weather forecast stations. A solution is to blendthe UMOS-AQ point forecasts with GEM-MACH15 gridded forecast fields. This is nowbeing done by optimal interpolation (OI) using MIST (Moteur d’Interpolation STatis-tique), an EC statistical interpolation package that uses the OI algorithm described byMahfouf et al. (2007).152.3. Air Quality Forecasting Models2.3.4 Neural Network ModelsAlthough many approaches such as box models, Gaussian plume models, persistenceand regression models are commonly applied to characterize and forecast air pollu-tants concentration, they are relatively straightforward with significant simplifications(Luecken et al., 2006).A promising alternative to these models is the neural network model (Lal and Tripa-thy, 2012; Nejadkoorki and Baroutian, 2012; Gardner and Dorling, 1998). Several NNmodels have already been used for different air pollutant concentration forecast. Gard-ner and Dorling (2000) used MLP NN to forecast the hourly ozone concentration atfive cities in UK and they found that NN outperformed both CART (classification andregression tree) and linear regression (LR). The predictors used included the amountof low cloud, base of lowest cloud, visibility, dry bulb temperature, vapour pressure,wind speed and direction. To account for seasonal effect, they had two extra predictorsin model 2, sin(2pid/365) and cos(2pid/365), with d the Julian day of the year, therebyinforming the model where in the annual cycle the forecast was made. Ballester et al.(2002) used a finite impulse response NN model to make 1-day advance predictions of8-hr average ozone concentrations in eastern Spain. The input variables were observed2h lagged observed values of air quality and meteorological inputs. The models wereevaluated using data from the 1996 to 1999 ozone seasons (July to September). Thestatistics of the model fits for three sampling sites ranged from 6.39 to 8.8 ppb for MAEand from 0.73 to 0.79 for R.For particulate matter (PM), Kukkonen et al. (2003) compared the performance offive different NN models for the prediction of PM10 concentrations in Helsinki. Resultsobtained showed that NN models performed better than linear models. In addition,Perez et al. (2000) constructed an NN PM2.5 forecast model to make predictions ofhourly averaged PM2.5 concentrations in the downtown area of Santiago, Chile. Threeforecast models, NN, LR, and persistence, were developed to predict PM2.5 concentra-tions at any hour of the day, using the 24 hourly averaged concentrations measured onthe previous day as the input variables. The normalized MAE (NMAE) of the predic-tions for 1994-1995 ozone season (May 1 to September 30) ranged from 30% to 60%.These authors found that PM2.5 formation strongly depended on weather conditions,with the PM2.5 concentrations negatively correlated with wind speed and relative hu-midity. NO2 concentration have also been investigated using NN (Gardner and Dorling,162.4. Summary1999).Several authors compared different approaches when applied to different pollutantsand prediction time lags (Boznar et al., 1993; Lu and Wang, 2005; Yi and Prybutok,2002). In the overview of NN application in the atmospheric sciences, Gardner andDorling (1998) concluded that NN generally gives as good or better results than linearmethods.2.4 SummaryStudies from the fields of machine learning and air quality models show that much efforthas been put into air quality forecasting, including the use of various machine learningmethods. Machine learning methods have been widely used in environmental scienceproblems and the applications of the MLP NN tend to provide some advantages overlinear methods based on the results of the previous studies. In air quality forecasting,machine learning methods are promising when compared with the linear regressionmodel and the photochemical dispersion model. The ELM method has been introducedto overcome some of the drawbacks in the popular MLP NN model, e.g. in computingtime and the local minima problem.17Chapter 3Data3.1 Study AreaThe updatable model output statistics - air quality (UMOS-AQ) model uses observa-tions from more than 250 stations across Canada (Fig 3.1). The stations belong to theNational Air Pollution Surveillance Network (NAPS), where each station measures allor a combination of the concentrations of ozone (O3), fine particulates (PM2.5) andnitrogen dioxide (NO2) (Antonopoulos et al., 2012). Six stations across Canada areused for model testing: Vancouver International Airport (British Columbia), Edmon-ton Central (Alberta), Winnipeg (Manitoba), Toronto Downtown (Ontario), MontrealAirport (Quebec) and Halifax (Nova Scotia). These six stations include the largestcities of Canada, the coastal cities and the major center for oil and gas industry inCanada. They all have different topography, weather conditions and major pollutionsources.3.2 Data SetThe data set used in this study covers the period 2009/07-2014/07 and was providedby UMOS-AQ model of Environment Canada. The first two years of data (2009/07-2011/07) were for model training and validation and the final three years (2011/08-2014/07) were used for model testing as well as model updating. The model will beevaluated using these 3-year data sets. As mentioned before, UMOS-AQ is a postpossessing system that combines multiple sources of information: AQ forecasts, mete-orological forecasts, AQ measurements and physical variables. Hence, the input datasets consisted of observational and numerical data.The observational air pollutant data were from automated near-real-time (NRT)hourly reports of local O3, PM2.5, and NO2 concentrations from around 250 urban andrural AQ measurement stations located across Canada. The near-real-time data have183.2. Data Set●●●●●●●● ●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●TorontoHalifaxMontrealWinnipegEdmontonVancouverFigure 3.1: Geographical distribution of the UMOS-AQ stations (red dots), with thesix stations selected in this study shown as blue triangles.193.2. Data Setuncertainty as they have not been verified and may not adequately reflect representativeair quality values. Numerical data came from an air quality model (GEM-MACH15),which was used to produce a set of direct and calculated predictors for our machinelearning models. GEM-MACH15 produced both chemical and meteorological fieldswith its twice daily 48-h forecasts starting at 00 and 12 UTC. Using these predictors,our models produced hourly spot concentration forecasts up to 48 h.The predictors used in this study includes persistence predictors, meteorological pre-dictors, chemical and physical predictors. The persistence predictors included observedozone, PM2.5, and NO2 concentration at the time the model was initiated (00 UTCand 12 UTC). Meteorological predictors consisted of model dry bulb temperature, windcomponent, geopotential height, relative humidity, dew point depression, surface modelpressure, cumulative precipitation, average rate of snowfall in water equivalent, cloudcover, model boundary layer height, wind speed and dew point temperature. Chemicalvariables were the maximum and average ozone, PM2.5, and NO2 concentrations during3, 6 and 24 hour interval. Physical variables included the downward solar flux, cal-culated mixing height, day of week, sine of the Julian day and calculated mixing height.Three predictands (target data) were considered, namely the observed ozone, PM2.5,and NO2 hourly average concentrations in the six stations across Canada. Nine an-tecedent predictors were also considered: 1) the pollutant concentration valid at thesame local sidereal time (LST) as the forecast but from 24 hours prior to the modelinitialization, 2) the maximum hourly average pollutant concentration observed withinthe 24 hour period prior to the model initialization, 3) the minimum hourly averagepollutant concentration observed within the 24 hour period prior to the model initial-ization. As UMOS did not include antecedent predictors, we did not use the antecedentpredictors when comparing against the UMOS benchmark.Some statistical properties of the ozone, PM2.5, and NO2 concentrations during thestudy period (2009/07 - 2014/07) are shown in Table 3.1. The mean and standarddeviation were calculated over the 5-year period, while the maximum values were themedian of each year’s maximum concentration with the median used to avoid influencefrom extreme events.203.2. Data SetStation Ozone (ppb) PM2.5 (µg/m3) NO2 (ppb)Mean Std.Dev. Max Mean Std.Dev. Max Mean Std.Dev. MaxVancouver 16.0 10.3 49.0 4.4 3.0 23.0 14.5 7.5 45.5Edmonton 18.2 10.4 63.0 9.9 6.6 68.0 17.7 9.2 58.5Winnipeg 25.9 11.7 66.5 5.8 3.7 37.0 6.5 6.3 39.5Toronto 25.4 11.4 78.0 6.8 5.0 37.0 14.9 6.8 45.0Montreal 23.6 10.9 67.0 9.5 6.1 47.0 9.8 6.9 52.0Halifax 21.5 9.7 53.0 5.4 3.2 33.5 3.2 3.1 18.0Table 3.1: Statistical properties of ozone concentration in 6 stations.Table 3.1 shows that Toronto and Winnipeg have the highest mean ozone values.All six stations have less than 10 µg/m3 mean PM2.5 concentration, with Montreal andEdmonton being about 30% larger than others. The NO2 mean value varies greatlyamong the six stations, as the mean in Halifax is only 3.2 ppb, whereas Toronto,Vancouver and Edmonton all have over 14 ppb mean. These statistics cannot providea full assessment of the air pollutant concentration in each city as the statistics maybe strongly influence by the location of each station within the city.21Chapter 4Methods and Models Set upIn this chapter, several different methods and models are introduced. Each methodproduces a different model per station per forecast hour per pollutant. The modeldevelopment can be separated into two phase: 1) training and 2) testing and updating.Models are first trained and validated using 2-year data sets (2009/07-2011/07) andafter the initialization phase, the models are used to predict the air pollutant concen-tration with newly arrived single datum or a chunk of data during 2011/08-2014/07.Model updating is conducted by either batch learning algorithm or an online-sequentiallearning algorithm from the newly arrived data.When data become available, batch learning performs a complete retraining of themodel using all past data plus the new data. It can be used to update the multiple lin-ear regression (MLR), multi-layer perceptron neural network (MLP NN) and extremelearning machine (ELM) methods. Depending on computation resources, batch updat-ing can be applied daily, monthly or seasonally. Batch learning can be computationallyintensive for nonlinear models as it may involve many iterations through the trainingdata. There are many applications where online-sequential learning algorithms are pre-ferred over batch learning algorithms as sequential learning algorithms do not requireretraining with the full dataset whenever new data arrive (Liang et al., 2006).A versatile online-sequential learning algorithm means the data for training are se-quentially presented (singly or as a chunk of data) to the learning algorithm. At anytime, only the newly arrived data (instead of all past data) are needed to update themodel. The new data, once learned by the model, can be discarded (Liang et al., 2006).The learning algorithm has no prior knowledge as to how many training dataset will bepresented. A comparison will be made between the online-sequential extreme learningmachine (OS-ELM), the online-sequential multiple linear regression (OS-MLR) and thereference model, UMOS which is also online-sequential.224.1. Input Data Preprocessing4.1 Input Data PreprocessingWithout properly transforming or scaling the input data, machine learning methodsmay be trained inefficiently and the resulting model may perform poorly. If the inputvariables in the training dataset vary greatly in magnitude, the model weights have toadapt to the differences. The resulting weights will also have a large spread in mag-nitude, rendering the training algorithm inefficient (Rasouli et al., 2012). Input datapreprocessing/scaling is an efficient way to solve the problem. The commonly usedscaling methods include: (i) linear transformation, (ii) statistical standardization, and(iii) nonlinear transformation (e.g. the logarithmic transformation). Input data in thisstudy is standardized, i.e. data have the mean value subtracted, then divided by thestandard deviation, yielding variables with zero mean and unit standard deviation. Asseparate forecast models are developed for the different hours of the day, there is noneed to remove the diurnal cycle from the input data. Control filters with minimumconcentration, maximum concentration and rate of change criteria were applied hereto remove unrealistic low/high observations and to ensure reasonable rates of changesin the measurements.4.2 Multiple Linear Regression (MLR)Multiple linear regression models were developed in the free R software (R DevelopmentCore Team, 2011) environment for statistical computing with the package “stats”.MLR is a statistical technique for finding the linear relation between the independentvariables (predictors) and the dependent or response variable (Kumar and Goyal, 2013).The general MLR model is built from N observations of the multiple predictor variablesxk (k = 1, . . . ,m) and the observed target data y. The MLR output variable yˆ can bewritten in terms of the input predictor variables asyˆ = β0 + β1x1 + β2x2...+ βmxm, (4.1)where βj(j = 0, . . . ,m) are the regression coefficients or parameters determined byminimizing the MSE between the model output and the target data using a linearleast squares algorithm. Stepwise regression is applied here using the R software tochoose relevant predictor variables by an automatic procedure (going both forward andbackward). The model was trained with two years of data (2009/07-2011/07), whilethe testing and updating were performed daily by batch learning using the 3-year data234.3. Online-Sequential Multiple Linear Regression (OS-MLR)(2011/08-2014/07). Predictors were re-selected and the linear regression was recalcu-lated during each model update.4.3 Online-Sequential Multiple Linear Regression(OS-MLR)To facilitate the rapid and frequent updating of large number of equations from alinear statistical model, OS-MLR models are developed using the sums-of-squares-and-cross-products matrix (SSCP) (Wilson and Valle´e, 2002). The idea of the updatingis to do part of the necessary recalculation of regression coefficients in near-real timeby updating the SSCP matrix and storing the data in that form rather than as rawobservations. The MLR model in (4.1) involves finding the least squares solution ofthe linear systemXβ = Y, (4.2)where the input data matrix X of dimension N × (m+ 1) isX =1 x11 · · · x1m....... . ....1 xN1 · · · xNm , (4.3)and the β parameter vector of length m+ 1 and the target data vector Y of length Nare, respectively,β =β0β1...βm and Y =y1y2...yN . (4.4)Minimizing ‖Xβ − Y‖2 leads to the solutionβˆ = (XTX)−1XTY, (4.5)where βˆ is the least squares estimate of the regression coefficients and K = XTX isthe SSCP or data covariance matrix.244.3. Online-Sequential Multiple Linear Regression (OS-MLR)First, start with an training set with N0 data points and the solution is given byβ(0) = K−10 XT0Y0, where K0 = XT0X0.Next, suppose a new chunk of data containing N1 data points has arrived, updatingthe model requires minimizing ∥∥∥∥∥[X0X1]β −[Y0Y1]∥∥∥∥∥2, (4.6)yielding the new parameter vectorβ(1) = K−11[X0X1]T [Y0Y1],with K1 =[X0X1]T [X0X1]. (4.7)For sequential learning, β(1) would need to be expressed only in terms of β(0),K1,X1and Y1. K1 can be written asK1 =[XT0 XT1] [X0X1]= K0 +XT1X1. (4.8)In (4.7), [X0X1]T [Y0Y1]= XT0Y0 +XT1Y1= K0K−10 XT0Y0 +XT1Y1= K0β(0) +XT1Y1= (K1 −XT1X1)β(0) +XT1Y1= K1β(0) −XT1X1β(0) +XT1Y1.(4.9)Substituting (4.9) into (4.7), we getβ(1) = K−11 (K1β(0) −XT1X1β(0) +XT1Y1)= β(0) +K−11 XT1 (Y1 −X1β(0)).(4.10)Generalizing the recursive algorithm for updating, when the (k+1)th chunk of newdata arrives, the least squares solution can be written asβ(k+1) = β(k) +K−1k+1XTk+1(Yk+1 −Xk+1β(k)),Kk+1 = Kk +XTk+1Xk+1.(4.11)254.4. Multi-layer Perceptron Neural Network (MLP NN)Eq.(4.11) gives the recursive formula for β(k+1) in OS-MLR. In summary, our initialmodel was trained by MLR using a 2-year data set (2009/07-2011/07). After that,prediction and model updating were conducted daily by the OS-MLR algorithm usinga 3-year data set (2011/08-2014/07).4.4 Multi-layer Perceptron Neural Network (MLP NN)To construct the MLP NN model (Fig.2.1), the neural network is considered to be asystem receiving information from m input variables xi (i = 1, . . . ,m), namely meteo-rological, physical and chemical predictors, and produces a single output, in our casethe concentration of ozone, PM2.5 or NO2. No prior knowledge about the relationshipbetween input and output variables is assumed. The MLP NN forecast model is de-veloped in R software using the “monmlp” package (Cannon, 2012a). The activationfunction used is the hyperbolic tangent function for the hidden layer and the identityfunction for output layer. Hence, the MLP NN with L hidden nodes or neurons ismathematically modeled byyˆi =L∑j=1βjf(wj · xi + bj) + β0, (i = 1, ..., N), (4.12)where f is the tanh function, xi and yˆi are the model input and output, respectively,N is the number of data points, wj = [wj1, wj2, ..., wjm]T and bj are the weights orparameters connecting the input layer to the jth hidden node, βj = [β1, β2, ..., βm]Tand β0 are the weights/parameters connecting the jth hidden node to the output.Training the MLP NN model involves adjusting the parameters or weights to mini-mize the objective function J , defined here to be the mean squared error (MSE) betweenthe model output and the target data yi:J =1NN∑i=1(yˆi − yi)2. (4.13)The minimization of J involves using back-propagation (Hsieh, 2009). A commonproblem in the development of a neural network is determining the optimal numberof hidden nodes (HN). A sequential, small grid search with bagging (abbreviated fromBootstrap Aggregating) is used to select the optimal number of HN (Lima et al., 2015).264.5. Extreme Learning Machine (ELM)Bagging (Breiman, 1996) is an ensemble method developed from the idea of boot-strapping in statistics. Under bootstrap resampling, data are randomly selected repeat-edly from a dataset with replacement to form a new training dataset, which has thesame number of data points as original dataset. A data point in the original datasetcan be selected more than once into the new training dataset. During the randomdraws, predictor and predictand pairs are drawn together. For autocorrelated data,data segments about the length of the autocorrelation time scale are drawn insteadof individual data points. In the bagging approach, one model can be built from onebootstrap sampled set, so an ensemble of models can be derived using a large number ofbootstrap sets. By averaging the model output from individual members in the ensem-ble, a final output is obtained. The data not selected in a bootstrap (the “out-of-bag”data) are used as validation data. NN model training is stopped when the model errorcalculated from the validation data begins to increase to prevent overfitting to noise inthe data (Hsieh, 2009).During the grid search for the optimal number of HN using data from 2009/07 to2011/07, we used an ensemble of 30 (bagging) models, and sequentially increased thenumber of HN one by one until the maximum number of HN was achieved or consec-utive increments were without improvements, based on the out-of-bag error. Due tothe limitation of computational resources and the time-consuming nature of the MLPNN, models were only batch updated seasonally using the 3-year data from 2011/08-2014/07. In each update, 30 bagging ensemble members were run with the same numberof HN as found in the initial training.4.5 Extreme Learning Machine (ELM)The Extreme Learning Machine was proposed by Schmidt et al. (1992) and Huanget al. (2006b) based on single-hidden layer feed-forward neural network (SLFNs) withrandom weights in the hidden layer. The ELM algorithm implements a SLFN similarin structure to an MLP NN model (Fig 2.1) and mathematically modeled as in (4.12).Our ELM uses the same activation functions as our MLP NN model, i.e. thehyperbolic tangent function for the hidden layer and the identity function for the outputlayer. Huang et al. (2006b) proved that the wi and bi parameter in (4.12) can berandomly assigned if the activation function is infinitely differentiable, so only the β274.5. Extreme Learning Machine (ELM)parameters need to be optimized when minimizing the mean squared error between themodel output yˆ and the target data y. Thus, in the ELM approach, training an SLFNis equivalent to simply finding the least-squares solution of the linear systemHβ = Y, (4.14)where the hidden layer output matrix H of dimension N × (L+ 1) isH =1 f(w1 · x1 + b1) · · · f(wL · x1 + bL)....... . ....1 f(w1 · xN + b1) · · · f(wL · xN + bL) , (4.15)and the β parameter vector of length L+ 1 and the target data vector Y of length Nareβ =β0β1...βL and Y =y1y2...yN . (4.16)Eqs.(4.14) and (4.16) are mathematically identical to the MLR Eqs.(4.2) and (4.4),hence ELM has transformed an MLP NN model requiring complicated nonlinear opti-mization to a simple MLR problem. The solution of the linear system for β is simplyvia least squares as in MLR, i.e.βˆ = H†Y (4.17)where H† is the Moore-Penrose pseudo-inverse (Liang et al., 2006).We consider the case where rank H = L, the number of hidden nodes, then H† isgiven byH† = (HTH)−1HT , (4.18)andβˆ = (HTH)−1HTY. (4.19)Huang et al. (2006b) did not have the bias parameter β0 in the output layer asleaving out β0 might make learning more difficult (Thimm and Fiesler, 1997), we in-284.5. Extreme Learning Machine (ELM)cluded it by having a first column of ones in H and having β0 in the top row of theβ matrix (Romero and Alque˜zar, 2012). Huang and Wang (2006) set the number ofhidden neurons empirically and used a uniform random distribution in the range [-1,1]for both the weights and bias parameters in the hidden layer. In order to performthe non-linear relationship of air quality problem efficiently, we need to find a moreaccurate and automatic way to choose the parameters and the net work structure (i.e.the number of HN).The hill climbing method was used to decide the optimal number of HN in ELM.Hill climbing is a simple mathematical optimization technique that starts with an arbi-trary solution to a problem, and then attempts to find a better solution (smaller MSE)by incrementally changing a single element (number of hidden nodes). If the changeproduces a better solution, an incremental change (often by a defined step size) is madetowards the new solution. Iterations continue until no further improvements can befound. In the hill climbing algorithm used here, the step size is automatically adjustedby the algorithm. Thus it shrinks when the probes do poorly and it grows when theprobes do well, helping the algorithm to be more efficient and robust (Yuret, 1994).For each candidate a 10-fold cross-validation within the training set was performed toavoid over-fitting. Cross-validation is a model validation technique for assessing howthe results of a statistical analysis will generalize to an independent data set. In k-foldcross-validation, the original sample is randomly partitioned into k equal sized subsam-ples, with a single subsample retained as the validation data for testing the model, andthe remaining k − 1 subsamples as training data. The process is repeated k times soall k subsamples are used as validation data.As weights are randomly assigned in ELM, diversity is an important factor becausemodel complexity is affected by the variance of the distribution. Typically a uniformdistribution, spread over a fixed interval [−r, r] is used for weight distribution. Thimmand Fiesler (1997) gave a review of random weight initialization methods for MLP andfound that r should be of the formr = aF−0.5 (4.20)where F is the number of predictors in the case of a 1-hidden layer MLP model. Usingthe hyperbolic tangent as the activation function, they found a ≈ 1 to be a reasonablevalue. For our ELM, a = 1 was chosen, and the random bias parameter bj in the hidden294.6. Online-Sequential Extreme Learning Machine (OS-ELM)layer was chosen to be uniformly distributed within [-1, 1].The ELM algorithm was only used to do the initial model training with two-yeardata sets (2009/07-2011/07). Model updating was then conducted by the OS-ELMalgorithm. Like many other learning algorithms the stochastic nature of ELM meansthat different trials of simulation may yield different results. The random assignmentof weight and bias parameters in the hidden layer makes each ELM distinct. To makethe ELM model more stable, we use an ensemble of 30 members in the ELM modelsand the output of the ensemble is the average of the individual ensemble members.4.6 Online-Sequential Extreme Learning Machine(OS-ELM)As ELM randomly chooses weights for the hidden layer and analytically determinesweights in the output layer by linear least squares, the ELM algorithm can be adaptedfor online sequential learning in the same way as the linear regression model in Sec 4.3.Given N0 observations in the initial training set with N0 ≥ L, the number of HN,if we use batch ELM to train the model, the matrix ‖H0β − Y0‖2 is minimized andthe solution (4.19) gives β(0) = K−10 HT0Y0, where K0 = HT0H0.Analogous to the online sequential solution (4.11) for OS-MLR, when the (k+ 1)thchunk of new data arrives, the recursive least-squares solution for updating OS-ELMisβ(k+1) = β(k) +K−1k+1HTk+1(Yk+1 −Hk+1β(k)),Kk+1 = K0 +HTk+1Hk+1.(4.21)To summarize, the OS-ELM approach consists of two parts: an initial trainingphase and a sequential learning phase. The initialization phase involves batch learningwith ELM on the initial training data and analytically solves (4.19). Following theinitialization phase, the model is updated in the online sequential learning phase by(4.21) using the newly arrive chunks of data. Once a chunk of data has been used, itcan be discarded as it is not used in future model updates (Liang et al., 2006). For ourcase, the initial model training was conducted by the ELM algorithm using two years304.7. Model Evaluationof data (2009/07-2011/07) with 30 ensemble members. Models are then update dailyby the OS-ELM algorithm using the 3-year data set from 2011/08 to 2014/07. Thenumber of hidden node was chosen in the initial learning phase using the hill climbingmethod and was not changed anymore during the online-sequential learning phase.4.7 Model EvaluationSeveral statistical scores were used to evaluate the performance of O3, NO2 and PM2.5model, including the Pearson correlation coefficient (r), mean absolute error (MAE),MAE/MAD (MAD being the mean absolute deviation), root mean squared error (RMSE)and skill score (SS).4.7.1 Pearson Correlation Coefficient (r)The Pearson correlation coefficient, reflecting the degree of linear relationship betweentwo variables, is defined byr =cov(Yˆ,Y)σYˆσY, (4.22)where Yˆ demotes the model predicted pollutant concentrations, Y the observed values,cov the covariance and σ the standard deviation. This coefficient varies from -1 to 1,with 0 indicating no relationship. While the Pearson correlation is a good measureof the linear association between predictions and observations, it does not take intoaccount the prediction bias, and is sensitive to rare extreme events.4.7.2 Mean Absolute Error (MAE)The mean absolute error (MAE) is the average absolute value of the forecast errors,withMAE =1NN∑i=1|yˆi − yi|, (4.23)where N is the number of data points, yi is the observed value and yˆi is the predictedvalue.314.7. Model Evaluation4.7.3 MAE/MADThe average air pollutant concentrations can vary from one location to another. Modelpredictions for areas with large variations in pollutant concentration levels usually havehigher MAE than those for areas with smaller variations. Therefore, the MAE is notalways useful for comparing model results from different locations. To normalize theerrors, MAE is divided by the mean absolute deviation (MAD) of the observations,yielding a relative mean absolute error that allows comparison between different loca-tions.MAEMAD=N∑i=1|yˆi − yi|N∑i=1|yi − y|, (4.24)where y is the mean of y.4.7.4 Root Mean Square Error (RMSE)The root mean squared error (RMSE) is the square root of the mean squared errorbetween the predictions and observations,RMSE =√√√√ 1NN∑i=1(yˆi − yi)2, (4.25)RMSE is more sensitive to outliers than the MAE.4.7.5 Skill Score (SS)The skill score is used to determine the skill of a forecast model by comparing it to abase or reference model such as climatology or persistence. In this study the skill scoreis given bySS =(A−Aref)(Aperfect −Aref) (4.26)where A represents the MAE from using the MLR, OS-MLR, MLP NN or OS-ELMmodel, Aref = MAE of the UMOS reference model and Aperfect = 0 for the MAE of aperfect model. A model with SS > 0 means that it performs better than the referencemodel, whereas SS < 0 means it has less skill than the reference model. The skill score324.8. Summarymakes it easy to compare the forecast skill of different models.4.8 SummaryIn summary, stepwise multiple linear regression (MLR), online-sequential multiple lin-ear regression (OSMLR), multi-layer perceptron neural network (MLP NN) and online-sequential extreme learning machine (OSELM) were developed and updated using atotal of five years of air quality data. Model performance was evaluated by severalstatistical scores and compared with those of the UMOS model. These five modelscompared include non-linear models (MLP NN, OSELM) and linear models (MLR,OSMLR, UMOS). They included two different approach to model updating, namelybatch learning (MLP NN, MLR) and online-sequential learning (OSMLR, OSELM,UMOS).33Chapter 5Results and DiscussionModel performance on testing data was evaluated by comparing the five model es-timates (UMOS, MLR, OSELM, OSMLR and MLP NN) with the near-real-time ob-served ozone concentrations. A comparison between our four models and UMOS wouldnot be completely fair due to differences in model strategy and some fundamental defi-ciencies in the UMOS implementation. UMOS uses separate models for the warm andcold seasons, whereas our models were developed for the whole year. Furthermore, theinteger precision of the UMOS output and the UMOS forecasts made under missingdata conditions likely lowered the UMOS forecast scores. Nevertheless, we have in-cluded UMOS in our model comparisons below.5.1 OzoneThe boxplot (Fig 5.1) shows the observed and predicted values at the six stations.The boxplot is a convenient way of displaying the distribution of data based on thefollowing statistics: median, first quartile, third quartile, minimum and maximum. Inthe boxplot, the central rectangle spans the first quartile to the third quartile, whilethe waistline inside the rectangle shows the median. Distance between the first andthird quartiles is the interquartile range (IQR). The upper whisker extends to the hightvalue within 1.5 IQR from the top of the rectangle, while the lower whisker extendsto the lowest value within 1.5 IQR from the bottom of the rectangle. Values beyondthe end of the whiskers are considered outliers and are shown as dots. Among the sixstations, Winnipeg has highest average ozone value and extreme cases, although theextreme values have not been vetted sufficiently to know if they are valid observations,i.e. they could be related to wildfire events upwind or local smoke issues. Fig 5.1 alsoindicates the forecast models tended to underestimate the median ozone concentration,and the extremes in Winnipeg.345.1. Ozone050100150Vancouver Edmonton Winnipeg Toronto Montreal HalifaxOzone (ppb)MethodObservationUMOSMLROSELMOSMLRMLPNNFigure 5.1: Boxplot of the observed ozone values and the predicted values from fivemethods over all forecast lead times (1 - 48hr) at six stations.Ozone model forecast error statistic are shown in Fig 5.2. The Pearson correlation(r) for the five models range from 0.75 to 0.85. The MAE for the models range from4.8 ppb to 7.3 ppb and RMSE range from 6.4 ppb to 9.8 ppb. The normalized errorMAE/MAD varies from 0.47 to 0.64.According to the correlation and normalized error, models have best performance inVancouver with smallest error and highest correlation attained by the OSELM method.Due to the extreme values in Winnipeg (Fig 5.1), all forecast models performed poorerthan other stations in terms of MAE and RMSE. All four models had better perfor-mance than UMOS, the benchmark, and OSELM generally outperformed the othermethods over the six stations. Both OSMLR and stepwise MLR are daily updatedlinear methods and both performed well in ozone forecasting, with these two linearregression methods showing best skill in Winnipeg. The seasonally updated MLP NNmodel tended to underperform when compared with MLR, OSELM and OSMLR, asthe other models were updated daily and the MLP NN only 3-monthly due to highcomputational cost - updating the MLP NN seasonally used more than 10 times thecpu time of the OSELM updated daily.355.1. OzoneMAE (ppb) RMSE (ppb)MAE/MAD r5. 5.2: Ozone forecast scores of different methods averaged over all forecast leadtimes (1 - 48hr) at the six stations.365.1. OzoneForecast correlation scores from the five models are shown by heat maps in Fig 5.3as the forecast lead time varies from 1 - 48 hrs, with the forecast initiated at 00 UTCand 12 UTC. The heat maps show high correlation in red and low correlation in blue,with some missing values in Edmonton and Winnipeg. From the mean ozone con-centration plots, strong diurnal cycle can be found in all six stations (top panels inFig 5.3). Station Vancouver, which has the lowest mean ozone concentration amongthe six stations, has the peak during late afternoon (local time 4 pm - 6 pm). Forecastmodels have better correlation scores during the peak time and the 00 UTC initiatedmodels also work well from 1 to 12 forecast hours. UMOS lost to other methods inVancouver, whereas the seasonally updated MLP NN method has competitive perfor-mance against the daily updated linear methods.Edmonton has average ozone concentration above 25 ppb in local late afternoon.Correlations above 0.8 occur in 1-4 and 16-27 hour forecast lead times of the 00 UTCinitiated models and 1-16 and 30-38 hour lead times of the 12 UTC initiated models.However, models show poor performance when ozone is low at night (12 am - 6 am)relative to other forecast hours as well as other stations. Winnipeg has highest ozoneconcentration among the six stations, with over 35 ppb mean value during peak time.Models have good performance for first 24 hours while initiated at 00 UTC but onlyworks well in first 12 forecast hours of the 12 UTC initiated models.In station Toronto, correlation scores of the 2-3, 16-26 and 42-48 hour lead timeforecasts initiated at 00 UTC are above 0.8, while for models initiated at 12 UTC,better performance occurs in the 1-14 and 30-37 hour lead times. These forecast leadtimes all correspond to the high ozone concentration period in the diurnal cycle. Wecan conclude that models initiated at 00 UTC and 12 UTC both work well to predictozone concentration in local time (LST) 12 pm - 9 pm and the 12 UTC initiated modelalso has good behavior during morning. Ozone in Montreal has average concentrationover 30 ppb in late afternoon. Models initiated at 00 UTC have higher correlationscores during 1-10, 19-24 and 44-46 hour lead times, while the 12 UTC initiated mod-els do well for 1-12, 32-35 hour lead times, indicating good model skills during daytime(9 am - 8 pm LST).While comparing different models, OSELM tends to outperform the others by mak-ing accurate predictions over a wider range of forecast hours. Comparing stations,models generally have lowest accuracy in Halifax.375.1. OzoneInitial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean ozone (ppb)Pearson CorrelationStationVancouver0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean ozone (ppb)Pearson CorrelationStationEdmonton0. OzoneInitial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean ozone (ppb)Pearson CorrelationStationWinnipeg0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean ozone (ppb)Pearson CorrelationStationToronto0. OzoneInitial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean ozone (ppb)Pearson CorrelationStationMontreal0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean ozone (ppb)Pearson CorrelationStationHalifax0. 5.3: Forecast correlation score as a function of the forecast lead time (1-48hr)from the five models displayed in a heat map (bottom panels) for forecasts initiatedat 00 UTC (left) and 12 UTC (right) at six stations. Black vertical stripes indicate“missing values”, i.e. fewer than 100 data points were available for model trainingduring 2009/07-2011/07. The mean diurnal ozone cycle is displayed in the top panels.00 UTC corresponds to local time (daylight saving time) of 5 pm, 6 pm, 7 pm, 8 pm,8 pm and 9 pm at Vancouver, Edmonton, Winnipeg, Toronto, Montreal and Halifax,respectively. 405.1. OzoneTo evaluate the performance of models, forecast skill scores were calculated relativeto UMOS reference model based on the MAE. Fig 5.4 shows the skill scores for differ-ent stations and forecast lead times, with scores above zero indicating that the modelshave smaller MAE than UMOS. All four methods (stepwise MLR, OSELM, OSMLRand MLP NN) have positive skill scores for most forecast lead times at all six stations,especially during the high ozone concentration period in the diurnal cycle. OSELMshows the best performance in stations Vancouver, Edmonton, Toronto, Montreal andHalifax, often outperforming UMOS by more than 10%. Linear methods (MLR andOSMLR) also do well in ozone forecasting, showing higher skill score than OSELMin Winnipeg. The seasonally updating MLP NN method slightly underperformed theother three methods and is the only method underperforming UMOS in Winnipeg wheninitiated at 00 UTC.To compare model performance in different seasons, prediction and observationaldata during the testing period (2011/8-2014/7) are broken into a warm season (April,May, June, July, August, September) and a cold season (October, November, Decem-ber, January, February, March). Table 5.1 shows the mean, standard deviation andmaximum value by station and season. It indicates the higher mean and maximumvalues during warm season, but some extreme events can contribute to maximum valueduring cold season (e.g. Winnipeg). The ozone residuals (prediction−observation)(Fig 5.5) show a tendency to have a negative median value (i.e. the models have atendency to underpredict) during the warm season. The UMOS model was developedseparately for the warm season and the cold season, whereas MLR, OSELM, OSMLRand MLP NN were developed using two entire years of data, which could be the reasonthat the latter four models underestimated the ozone concentration during the warmseason. For forecast scores in Fig 5.6, although MAE and RMSE are higher duringthe warm season, models tend to have better performance in the warm season in termsof MAE/MAD and r for all stations except Vancouver. Comparing different models,OSELM tends to have better scores than the other four models in Vancouver, Toronto,Montreal and Halifax for both seasons.415.1. OzoneInitial00 Initial120.− 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourMethodMLROSELMOSMLRMLPNNOzone MAE Skill ScoreFigure 5.4: Ozone MAE skill score of different models by forecast hour, with forecastsinitiated at 00 UTC (left column) and 12 UTC (right column). The panels are arrangedin six rows, from Vancouver (top) to Halifax (bottom).425.1. OzoneStation Warm Season Cold SeasonMean Std.Dev. Max Mean Std.Dev. MaxVancouver 19.9 11.5 52.0 13.9 12.2 47.6Edmonton 25.3 12.3 64.0 15.9 10.4 63.0Winnipeg 33.9 17.0 97.0 26.0 13.4 147.0Toronto 31.2 14.4 90.0 21.4 10.2 72.0Montreal 27.3 13.0 75.9 20.9 11.2 56.5Halifax 25.2 11.6 68.0 27.0 9.8 58.0Table 5.1: Statistical properties of ozone concentration (ppb) by station and season.Warm Cold−30−20−1001020VancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxOzone (ppb) MethodUMOSMLROSELMOSMLRMLPNNFigure 5.5: Boxplot of the ozone residuals (prediction−observation) by season andstation. Outliers are not plotted but can be seen in Fig 5.7.435.1. OzoneWarm Cold5678789100.450.500.550.600.650.7500.7750.8000.8250.850MAE (ppb)RMSE (ppb)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxOzoneMethodUMOSMLROSELMOSMLRMLPNNFigure 5.6: Ozone forecast scores (MAE, RMSE, MAE/MAD and r) by season andstation.445.1. OzoneTo study how well the models are in forecasting extreme events of ozone concen-tration, the top 10th percentile ozone observational data and the corresponding modelpredictions are extracted for the two season during 2011/08-2014/07. Table 5.2 pro-vides statistical properties of the top 10th percentile ozone concentration observationsby season and station, which shows the mean value over 40 ppb during the warm seasonand over 35 ppb for the cold season. Winnipeg has highest extreme ozone concentra-tion with 64 ppb average during warm season and 147 ppb maximum during the coldseason over three years. Fig 5.7 presents the ozone residuals for top 10th percentile.As the models produced 48 hours of forecasts, a high ozone day can generate multi-ple outlier points in the boxplot. The models all have a negative median indicatingunderprediction of the extreme values, slightly less serious in the cold season than thewarm season. The forecast scores for the top 10th percentile (Fig 5.8) when comparedwith the forecast scores for all data (Fig 5.6 or Fig 5.2) revealed that the linear models(OSMLR and MLR) tended to improve relative to the nonlinear models (OSELM andMLP NN) when considering only the top 10th percentile, i.e. the linear models tendedto perform better than the nonlinear models when forecasting on high ozone days.Station Warm Season Cold SeasonMean Std.Dev. Max Mean Std.Dev. MaxVancouver 40.2 3.3 52.0 37.0 3.2 47.6Edmonton 46.6 4.3 64.0 35.3 5.3 63.0Winnipeg 64.0 6.3 97.0 48.8 6.8 147.0Toronto 58.1 7.3 90.0 38.8 4.8 72.0Montreal 50.1 5.2 75.9 39.3 3.8 56.5Halifax 45.0 3.9 68.0 42.5 2.5 58.0Table 5.2: Statistical properties of top 10th percentile ozone concentration by stationand season.455.1. OzoneWarm Cold−80−400VancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxOzone (ppb) MethodUMOSMLROSELMOSMLRMLPNNFigure 5.7: Boxplot of ozone residuals (prediction−observation) (over all leadtimes)from the top 10th percentile by season and station.465.1. OzoneWarm Cold7.510.012.515.091215182. (ppb)RMSE (ppb)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxOzoneMethodUMOSMLROSELMOSMLRMLPNNFigure 5.8: Ozone forecast scores of top 10th percentile by season and station.475.2. PM2.55.2 PM2.5PM2.5 observations and predictions by station are shown in Fig 5.9. Edmonton andMontreal have the highest median PM2.5 concentration (over 8 µg/m3) among the sixstations while Vancouver has the lowest median (5 µg/m3). From the boxplot, in termsof the median, UMOS tends to agree with the observed PM2.5 values better than theother methods, which tend to over-predict in Edmonton, Winnipeg, Montreal and Hal-ifax. Outliers in Edmonton, Winnipeg and Montreal are more common than in theother three stations, indicating more extreme PM2.5 values.050100150Vancouver Edmonton Winnipeg Toronto Montreal HalifaxPM2.5 (µg/m3 )MethodObservationUMOSMLROSELMOSMLRMLPNNFigure 5.9: Boxplot of the PM2.5 observations and predictions by different methods atsix stations.Fig 5.10 illustrates the PM2.5 model forecast scores during the testing period bystation. Models have poorer scores (MAE/MAD and r) for PM2.5 than for ozone(Fig 5.2). All models have similar MAE and RMSE for PM2.5. The correlation score(r) ranges from 0.4 to 0.7, with higher r found in Vancouver, Toronto and Montreal, andlower r in Winnipeg and Halifax. Comparing different methods, from the relative error(MAE/MAD) plot we found that UMOS had the lowest MAE/MAD in Edmonton,Winnipeg and Montreal. In Fig 5.9, these three stations have more outliers, and MLR,OSELM, OSMLR and MLP NN are all over predicting, as their median values are above485.2. PM2.5the observed median. OSELM slightly outperformed the other methods in Vancouver,Toronto and Halifax, while OSMLR tended to have the highest relative error in moststations.MAE (µg/m^3) RMSE (µg/m^3)MAE/MAD r2. 5.10: PM2.5 forecast scores of different methods at the six test stations.The mean diurnal cycle in the PM2.5 concentration and the forecast correlationscore for different forecast lead times and initial hours are shown in Fig 5.11. The heatmap shows r > 0.6 in red, r = 0.6 in white, r < 0.6 in blue, and missing values in black.Station Vancouver, which has the lowest mean PM2.5 concentration, shows two peakperiods at night (11 pm LST) and in the morning (9 am LST) and the trough in lateafternoon (5 pm LST). Forecast correlation scores vary according to the PM2.5 diurnalcycle, as models have better performance around 9 am and 11 pm LST. The diurnalcycle in Edmonton is similar to that of Vancouver, with the highest concentration hap-pening in the morning (10 am LST) and at night(10 pm LST). However, models onlyhave r > 0.6 for 1-8hr forecast lead time for both 00 UTC and 12 UTC initiation time.OSELM and MLP NN slightly outperformed other methods for 12-20hr forecast lead495.2. PM2.5time at 00 UTC initiation. Models in Winnipeg performed worse with r < 0.6 for mostforecast hour, especially for long lead time.Correlation scores in Toronto and Montreal are the best among the six stations.PM2.5 concentration peaks in the morning (9 am LST) with a secondary peak at night(10 pm LST) for both station. In Toronto, models have r > 0.6 during 1-24 forecasthours for 00 UTC initiation and 1-36 forecast hours for 12 UTC initiation. The cor-relation heat map of Montreal shows r > 0.7 for 1-24 forecast hours and the modelsperformed well even during the low concentration period in the diurnal cycle (32-36forecast lead times of the 12 UTC models). Halifax has a similar mean PM2.5 valueas Vancouver, but the models’ scores are much worse. Most of the correlation scoresare below 0.6, during both high and low concentration periods. UMOS loses to othermethods for the 20-24 and 40-48 hour forecasts initiated at 00 UTC, whereas OSELMtends to slightly outperform the other models.PM2.5 model MAE skill scores (relative to UMOS) are plotted in Fig 5.12. ForVancouver, positive skill scores are found for most forecast lead times for the 00 UTCinitiated models, while only OSMLR loses to UMOS in the 12 UTC initiated models.OSELM and MLP NN tend to have better scores than the two linear daily updatingmethods in Vancouver. For Edmonton, Winnipeg and Montreal, all of the models de-veloped underperform UMOS (by about 5%), especially for the MLP NN and OSMLRmodels. OSELM is the only one surpassing UMOS for most forecast lead times inToronto and Halifax.505.2. PM2.5Initial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean PM2.5 (µg/m3 )Pearson CorrelationStationVancouver0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean PM2.5 (µg/m3 )Pearson CorrelationStationEdmonton0. PM2.5Initial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean PM2.5 (µg/m3 )Pearson CorrelationStationWinnipeg0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean PM2.5 (µg/m3 )Pearson CorrelationStationToronto0. PM2.5Initial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean PM2.5 (µg/m3 )Pearson CorrelationStationMontreal0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean PM2.5 (µg/m3 )Pearson CorrelationStationHalifax0. 5.11: Mean diurnal PM2.5 concentration and heat map of the correlation scoreby model and station for forecast lead time 1-48hr and forecasts initiated at 00 UTC(left) and 12 UTC (right).535.2. PM2.5Initial00 Initial12−0.10−−0.10.0−0.15−0.10−−−0.10−− 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourMethodMLROSELMOSMLRMLPNNPM2.5 MAE Skill ScoreFigure 5.12: PM2.5 MAE skill score of different models by forecast hour for the sixstations.545.2. PM2.5Table 5.3 presents statistical properties of PM2.5 concentration by station and sea-son. High and low PM2.5 values do not correspond to different seasons. Vancouver,Edmonton, Montreal and Halifax have lower mean values during the warm season,though the outliers value of 147 µg/m3 (Fig 5.9) was found in the warm season inEdmonton. In contrast, Winnipeg and Toronto have higher mean PM2.5 concentrationduring the warm season. Fig 5.13 and Fig 5.14 display model performance by seasonaccording to the residuals and forecast scores. Model residuals are relative smaller inmagnitude in the warm season according to the first and third quartiles in Fig 5.13.According to the median of the residuals (Fig 5.13), overprediction mainly occur in Ed-monton, Winnipeg and Montreal for both seasons, which is consistent with the modelsperforming worst relative to UMOS at these three station (Fig 5.12).In Fig 5.14, there is little difference in the forecast scores among the models, how-ever there are differences in the scores between the warm and cold seasons - e.g. allthe models perform better in the warm season in Toronto and Halifax, and in thecold season in Edmonton and Montreal. However, the much poorer MAE/MAD andr scores in the warm season relative to the cold season for Edmonton could be causedby the outlier of 147 µg/m3 in the warm season data as noted earlier. For the corre-lation score, OSELM was marginally ahead of all the other models in the cold seasonfor all stations, and in the warm season for all stations except Edmonton and Winnipeg.Station Warm Season Cold SeasonMean Std.Dev. Max Mean Std.Dev. MaxVancouver 4.6 3.0 24.0 5.2 5.1 48.0Edmonton 8.1 6.9 147.0 9.3 8.1 97.0Winnipeg 6.7 6.9 83.0 5.7 5.9 88.0Toronto 8.1 6.2 62.0 7.3 6.7 75.0Montreal 7.9 6.6 88.0 8.9 7.7 75.0Halifax 6.1 4.9 46.0 6.3 5.0 55.0Table 5.3: Statistical properties of PM2.5 concentration (µg/m3) by station and season.555.2. PM2.5Warm Cold−10−5051015VancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxPM2.5 (µg/m3 ) MethodUMOSMLROSELMOSMLRMLPNNFigure 5.13: Boxplot of PM2.5 residuals (prediction−observation) by season and station.Outliers are not plotted but can be seen in Fig 5.15.565.2. PM2.5Warm Cold234345670. (µg/m^3)RMSE (µg/m^3)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxPM2.5MethodUMOSMLROSELMOSMLRMLPNNFigure 5.14: PM2.5 forecast scores by season and station.575.2. PM2.5The top 10th percentile of the observational data and the corresponding model pre-dictions are extracted by season and station. Table 5.4 shows that the mean extremevalue in the cold season is higher than that in the warm season for all stations exceptWinnipeg, and the maximum PM2.5 concentrations are also higher for the cold season inVancouver, Winnipeg, Toronto and Halifax. The boxplot of residuals (Fig 5.15) showsthat all five models were unable to capture the maximum PM2.5 values (147 µg/m3) inEdmonton, which would have a major impact on the forecast scores. Fig 5.16 presentsthe error and correlation scores for different models by season. All of the MAE/MADvalues are above 1 and most correlations are below 0.3, which indicate weak modelperformance. For extremes, the nonlinear models in general did not improve on thelinear models (UMOS, MLR and OSMLR).Station Warm Season Cold SeasonMean Std.Dev. Max Mean Std.Dev. MaxVancouver 11.0 2.4 24.0 17.0 5.1 48.0Edmonton 22.6 9.7 147.0 27.0 9.5 97.0Winnipeg 20.7 12.4 83.0 19.2 7.2 88.0Toronto 21.7 5.5 62.0 22.6 7.5 75.0Montreal 21.5 9.1 88.0 26.2 7.6 75.0Halifax 16.7 5.0 46.0 17.1 5.3 55.0Table 5.4: Statistical properties of top 10th percentile PM2.5 concentration (µg/m3) bystation and season.585.2. PM2.5Warm Cold−150−100−50050VancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxPM2.5 (µg/m3 ) MethodUMOSMLROSELMOSMLRMLPNNFigure 5.15: Boxplot of PM2.5 residuals (prediction−observation) (over all lead times)from the top 10th percentile by season and station.595.2. PM2.5Warm Cold5.07.510.04812161.52.02.5− (µg/m^3)RMSE (µg/m^3)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxPM2.5MethodUMOSMLROSELMOSMLRMLPNNFigure 5.16: PM2.5 forecast scores of top 10th percentile by season and station.605.3. NO25.3 NO2Boxplot of NO2 observations and predictions by station are shown in Fig 5.17. Ed-monton has highest median NO2 concentration (16 ppb) and the most extreme events,whereas Halifax’s median concentration is only 1.4 ppb. The station in Halifax is lo-cated near Lake Major, which may explain the low NO2 concentration as automobileemission is the main source of NO2. Fig 5.17 also indicates the model medians to lieabove the observed median for all stations. The forecast scores in Fig 5.18 show all fivemethods to have similar performance, and except for Halifax, the relative errors aregenerally below 0.7, with OSELM being marginally better than the other methods. InHalifax, UMOS has lowest relative error, but it is still greater than 1. From the corre-lation score, OSELM is slightly ahead and UMOS is slightly behind all other methodsat all five stations.050100150Vancouver Edmonton Winnipeg Toronto Montreal HalifaxNO2 (ppb)MethodObservationUMOSMLROSELMOSMLRMLPNNFigure 5.17: Boxplot of the observed NO2 values and the predicted values from fivemodels (over all forecast lead times ) at six stations.615.3. NO2MAE (ppb) RMSE (ppb)MAE/MAD r23452345670. 5.18: NO2 forecast scores of different methods in the six stations.In Fig 5.19, the heat map displays r > 0.7 in red and r < 0.7 in blue and missingdata at Edmonton and Winnipeg in black. NO2 often has the highest diurnal valueduring the morning (9 am LST), with a second peak during the night (10-11 pm LST)and becoming lower in the afternoon (2-4 pm LST). The main except occurs in Van-couver where the second peak is slightly higher than the first, and in Halifax wherethe highest peak occurs at 3-4 am LST. Models in Vancouver have good performanceduring 1-25 hour forecasts when initiated at 00 UTC and UMOS underperforms othermethod for 20-40 hour forecasts when initiated at 12 UTC. Edmonton, which has thehighest mean concentration among the six stations, has the best model behavior amongthe stations. The difference in r between the five methods is small, though OSELMand MLR are slightly stronger and UMOS slightly weaker than others. In Winnipeg,OSELM is slightly stronger and UMOS slightly weaker than the others, with all modelsforecasting poorer during the trough in the diurnal cycle.625.3. NO2In Toronto, the mean NO2 concentration in the morning is over 17.5 ppb. Correla-tion scores for 1-15 forecast hours from 00 UTC and 1-5 forecast hours from 12 UTCare generally higher than other lead times, again with OSELM being slightly strongerand UMOS slightly weaker among the models. In Montreal, models have relatively highcorrelation scores, but poorer performance can be found during the low concentrationperiod and the second peak period, corresponding to the local afternoon and midnight.OSELM and MLR are slightly stronger and UMOS slightly weaker among the models.Halifax has the worst model scores, with correlation below 0.4 most of the time.MAE skill scores relative to UMOS (Fig 5.20) illustrates that MLR, OSELM,OSMLR and MLP NN have positive skill scores in Vancouver and Edmonton for mostforecast hours, and OSELM slightly outperforms other methods, whereas OSMLR andMLP NN slightly underperform in Vancouver and Edmonton, respectively. For Win-nipeg, Toronto and Montreal, negative skill scores occur during the low concentrationperiod in the diurnal cycle, upon comparing Fig 5.20 with Fig 5.19, indicating thatUMOS is slightly better at forecasting during low NO2 hours. In Winnipeg and Mon-treal, OSELM slightly outperforms MLR, OSMLR and MLP NN, but all four modelshave comparable skills in Toronto. In Halifax, all four methods lose to UMOS in theMAE skill score.635.3. NO2Initial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean NO2 (ppb)Pearson CorrelationStationVancouver0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean NO2 (ppb)Pearson CorrelationStationEdmonton0. NO2Initial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean NO2 (ppb)Pearson CorrelationStationWinnipeg0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean NO2 (ppb)Pearson CorrelationStationToronto0. NO2Initial00 Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean NO2 (ppb)Pearson CorrelationStationMontreal0. Initial12MLPNNOSMLROSELMMLRUMOSMLPNNOSMLROSELMMLRUMOS1 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourPearson Correlation0. 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Mean NO2 (ppb)Pearson CorrelationStationHalifax0. 5.19: Mean diurnal NO2 concentration and heat map of the correlation scoreby model and station.665.3. NO2Initial00 Initial12−−−0.10−−0.10−−0.3−0.2− 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourMethodMLROSELMOSMLRMLPNNNO2 MAE Skill ScoreFigure 5.20: NO2 MAE skill score of different models by forecast hour for the sixstations.675.3. NO2To analyze the model performance seasonally, Table 5.5 shows the statistical prop-erties of NO2 by season and station, where higher NO2 concentration occurs in thecold season for all six stations. For the warm season (Fig 5.21), linear models (MLR,OSMLR) have median residuals slightly closer to 0 and all our models have medianresiduals slightly closer to 0 than UMOS for all stations. However, UMOS has medianresiduals slightly better than most models in the cold season, especially in Halifax.The models have little spread in the errors (MAE and RMSE) (Fig 5.22), however,MLR, OSELM, OSMLR and MLP NN have slightly higher r scores than UMOS at allstations for both the warm and cold seasons, with OSELM slightly ahead of the othermethods.Station Warm Season Cold SeasonMean Std.Dev. Max Mean Std.Dev. MaxVancouver 11.2 7.6 46.8 17.2 9.7 56.8Edmonton 11.6 7.7 62.0 23.6 11.3 155.0Winnipeg 3.5 5.3 40.0 8.1 8.6 50.0Toronto 12.0 7.5 60.0 15.6 8.6 65.0Montreal 7.4 6.2 57.9 12.5 10.0 61.2Halifax 1.0 1.5 21.0 1.3 2.0 26.0Table 5.5: Statistical properties of NO2 concentration (ppb) by station and season.685.3. NO2Warm Cold−20−1001020VancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxNO2 (ppb)MethodUMOSMLROSELMOSMLRMLPNNFigure 5.21: Boxplot of NO2 residuals (prediction−observation) by season and station.Outiers are not plotted but can be seen in Fig 5.23.695.3. NO2Warm Cold2345624680. (ppb)RMSE (ppb)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxNO2MethodUMOSMLROSELMOSMLRMLPNNFigure 5.22: NO2 forecast scores by season and station.705.3. NO2For the top 10th percentile observations, Table 5.6 shows Edmonton to have themost extreme event with 155 ppb of NO2. From the model residual shown in Fig 5.23,the medians are all below 0, indicating under prediction of the extremes, though ourmodel medians tend to be slightly less negative than those from UMOS. The MAE,RMSE, MAE/MAD and correlation scores of top 10th percentile data in Fig 5.24 showthat model skills tend to be better in the warm season, and in terms of MAE/MADour models tend to slightly outperform UMOS at all stations in both the warm andcold seasons.Station Warm Season Cold SeasonMean Std.Dev. Max Mean Std.Dev. MaxVancouver 27.3 4.08 46.8 34.83 3.83 56.8Edmonton 27.78 6.73 62 45.36 8.62 155Winnipeg 16.34 5.85 40 28.10 6.68 50Toronto 28.83 5.91 60 34.58 6.12 65Montreal 21.63 6.07 57.9 34.08 5.52 61.2Halifax 4.49 1.94 21 5.8 2.67 26Table 5.6: Statistical properties of top 10th percentile NO2 concentration (ppb) bystation and season.715.3. NO2Warm Cold−100−50050VancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxNO2 (ppb)MethodUMOSMLROSELMOSMLRMLPNNFigure 5.23: Boxplot of NO2 residuals (prediction−observation) (over all lead times)from the top 10th percentile by season and station.725.3. NO2Warm Cold2.55.07.510.012.54812161. (ppb)RMSE (ppb)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxNO2MethodUMOSMLROSELMOSMLRMLPNNFigure 5.24: NO2 forecast scores of top 10th percentile by season and station.735.4. Model Results with Antecedent Predictors5.4 Model Results with Antecedent PredictorsNine antecedent predictors, i.e. the pollutant concentration 24 hours prior to the fore-cast time, the maximum and minimum pollutant concentration observed within the 24hour period prior to the model initialization for ozone, PM2.5 and NO2, were addedto the OSELM-A and OSMLR-A models to test whether they would contribute to themodel accuracy, with the new model results compared with the original UMOS, OS-ELM and OSMLR results.For ozone, models with antecedent predictors performed better than the originalmodels in Winnipeg and Halifax but only marginally at the other four stations, asseen in the MAE, RMSE, MAE/MAD and r (Fig 5.25). In the ozone MAE skillscore (Fig 5.26), the models with antecedent predictors appear to improve on theoriginal models mainly in Winnipeg and Halifax. Forecast scores for data in the top10th percentile (Fig 5.27) show that in the warm season, the antecedent predictorsmainly improved the scores in Winnipeg and Edmonton, while in the cold season, theymainly improved in Halifax, followed by Winnipeg and Edmonton. We conclude thatadding the antecedent predictors tend to improve on the ozone forecasts, especially inpredicting extreme values at some stations.745.4. Model Results with Antecedent PredictorsMAE (ppb) RMSE (ppb)MAE/MAD r5.−AOSMLROSMLR−AFigure 5.25: Ozone forecast scores from models with and without antecedent predictorsin the six stations. Models with antecedent predictors (OSELM-A and OSMLR-A) arein red, the original models without the extra predictors are in blue, and UMOS is inblack.755.4. Model Results with Antecedent PredictorsInitial00 Initial12− 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourMethodOSELMOSELM−AOSMLROSMLR−AOzone MAE Skill ScoreFigure 5.26: Ozone MAE skill score from models with and without antecedent predic-tors by forecast hour.765.4. Model Results with Antecedent PredictorsWarm Cold5.07.510.012.515.091215182. (ppb)RMSE (ppb)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxOzoneMethodUMOSOSELMOSELM−AOSMLROSMLR−AFigure 5.27: Ozone top 10th percentile forecast scores from models with and withoutantecedent predictors by season and station.775.4. Model Results with Antecedent PredictorsFor PM2.5, Fig 5.28 shows that models with antecedent predictors tended to im-prove on the original models at all six stations, with smaller MAE/MAD and higherPearson correlation, mainly in Edmonton, Winnipeg and Halifax. For the MAE skillscore (Fig 5.29) at stations Edmonton, Winnipeg and Montreal, all our original mod-els had negative skill scores relative to UMOS for most forecast lead times, but withthe antecedent predictors added, OSELM-A is slightly outperforming UMOS for mostforecast hours in these three stations, though OSMLR-A is still behind UMOS. In Van-couver, Toronto and Halifax, the new models have similar performance as the originalmodels. For the top 10th percentile PM2.5 data, Fig 5.30 shows that other than improv-ing the scores in Halifax in the cold season, adding the antecedent predictors broughtno clear benefit. In conclusion, models with antecedent predictors added seemed toimprove on PM2.5 forecasting, but not in forecasting extreme PM2.5 concentration.MAE (µg/m^3) RMSE (µg/m^3)MAE/MAD r23434560.650.700.750.800.850.50.60.7VancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxPM2.5MethodUMOSOSELMOSELM−AOSMLROSMLR−AFigure 5.28: PM2.5 forecast scores from models with and without antecedent predictorsin the six stations.785.4. Model Results with Antecedent PredictorsInitial00 Initial12−0.15−0.10−−−0.2−−−0.10−− 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourMethodOSELMOSELM−AOSMLROSMLR−APM2.5 MAE Skill ScoreFigure 5.29: PM2.5 MAE skill score from models with and without antecedent predic-tors by forecast hour.795.4. Model Results with Antecedent PredictorsWarm Cold5.07.510.04812161. (µg/m^3)RMSE (µg/m^3)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxPM2.5MethodUMOSOSELMOSELM−AOSMLROSMLR−AFigure 5.30: PM2.5 top 10th percentile forecast scores from models with and withoutantecedent predictors by season and station.805.4. Model Results with Antecedent PredictorsFor NO2, Fig 5.31 shows that adding antecedent predictors offered essentially noimprovement at all stations except Halifax. The MAE skill score (Fig 5.32) showsclear improvement in Halifax - the skill scores of OSELM and OSMLR were mainlynegative over all forecast hours, but have changed to mainly positive in OSELM-A andOSMLR-A. For data in the top 10th percentile, adding antecedent predictors mainlyhelped to improve forecast scores in Halifax, and slightly reduced the errors in the coldseason in Vancouver and Montreal.MAE (ppb) RMSE (ppb)MAE/MAD r23452345670.−AOSMLROSMLR−AFigure 5.31: NO2 forecast scores from models with and without antecedent predictorsin the six stations.815.4. Model Results with Antecedent PredictorsInitial00 Initial12−0.10−−−−−− 4 8 12 16 20 24 28 32 36 40 44 48 1 4 8 12 16 20 24 28 32 36 40 44 48Forecast HourMethodOSELMOSELM−AOSMLROSMLR−ANO2 MAE Skill ScoreFigure 5.32: NO2 MAE skill score from models with and without antecedent predictorsby forecast hour.825.4. Model Results with Antecedent PredictorsWarm Cold2.55.07.510.012.54812161. (µg/m^3)RMSE (µg/m^3)MAE/MADrVancouverEdmontonWinnipegTorontoMontrealHalifaxVancouverEdmontonWinnipegTorontoMontrealHalifaxNO2MethodUMOSOSELMOSELM−AOSMLROSMLR−AFigure 5.33: NO2 top 10th percentile forecast scores from models with and withoutantecedent predictors by season and station.83Chapter 6Conclusion6.1 SummaryThis study focuses on the hourly spot concentration forecasts of ozone (O3), particu-late matter 2.5µm (PM2.5) and nitrogen dioxide (NO2) concentration up to 48 hoursfor six stations across Canada (Vancouver, Edmonton, Winnipeg, Toronto, Montrealand Halifax). In air quality prediction, model accuracy, efficiency and updatability arekey considerations. Many current air quality forecasting methods use only linear tech-niques which would miss nonlinear relationship in the data. In many cases, machinelearning methods have been found to outperform linear techniques for air quality pre-diction. But traditional neural networks have a number of difficulties including com-putational expense, local minima and over-fitting, which hamper their effectiveness.Consequently, the extreme learning machine (ELM), an updatable machine learningalgorithm using randomized neural networks, was applied in air quality forecasting.In this study, air quality forecasting models - the stepwise multiple linear regression(MLR), online-sequential multiple linear regression (OSMLR), multilayer perceptronneural network (MLP NN) and online-sequential extreme learning machine (OSELM)- have been studied using five years of data (2009/07-2014/07). The prediction perfor-mances of the MLR, OSMLR, MLP NN and OSELM are evaluated against updatablemodel output statistics (UMOS) from Environmental Canada.For ozone, all four models (MLR, OSELM, OSMLR and MLP NN) performed bet-ter than UMOS, the benchmark, especially during the high ozone concentration periodin the diurnal cycle. OSELM showed the best performance among the various modelsin all stations except Winnipeg, often outperforming UMOS by more than 10% in theMAE skill score. Linear methods (MLR and OSMLR) also did well in ozone forecastingrelative to UMOS, showing the highest MAE skill scores among the models in Win-nipeg. For extreme ozone events (top 10th percentile) prediction, all models tendedto underpredict extreme values, while linear models tended to improve relative to thenonlinear methods when considering only the top 10th percentile data.846.1. SummaryFor PM2.5, model correlation scores ranged from 0.5 to 0.7 and all models under-performed UMOS in Edmonton, Winnipeg and Montreal. OSELM was the best modelin Vancouver, Toronto and Halifax, surpassing UMOS and the other models for mostforecast lead times. For extreme PM2.5 events, all of the relative errors (MAE/MAD)were above 1, indicating weak model performance, and nonlinear models still in generaldid not improve on the linear models.For NO2, OSELM was marginally better than all the other methods in all stationsexcept Halifax. In Halifax, all four methods (MLR, OSELM, OSMLR and MLP NN)lost to UMOS in the MAE, as UMOS tended to forecast slightly better during the lowNO2 hours in the diurnal cycle. For the top 10th percentile NO2 observations, OSELM,MLR, OSMLR and MLP NN tended to slightly outperform UMOS at all stations inboth warm and cold seasons.Antecedent predictors have different effects on three different pollutants. Addingthe antecedent predictors tended to improve on the ozone forecasts, especially in pre-dicting extreme values. For PM2.5, models with antecedent predictors had improvedperformance compared to the original models, but not in forecasting extreme PM2.5concentration. For NO2, adding antecedent predictors offered improvement only atHalifax, but also slightly reduced the errors for extreme NO2 in the cold season inVancouver and Montreal.This study has demonstrated the potential of using nonlinear machine learningmethods to improve air quality forecasts. In terms of improving forecast accuracy,OSELM appeared most beneficial in ozone forecast and least in PM2.5 forecast. MLPNN when updated with new data only seasonally due to the large computational costgenerally underperformed the daily updated linear methods (MLR and OSMLR). Incontrast, OSELM, with its low cost updating and nonlinear modeling capability, gen-erally outperformed the linear methods in forecast accuracy. Antecedent predictorscould also be added to the models to improve forecast accuracy.856.2. Future Research6.2 Future ResearchA limitation of this research was that models were not developed separately by sea-son because of the limited data record. When more data are available, it could beworth investigating using separate models for different seasons. Another limitation isthe model structure of OSELM and MLP NN methods. The number of hidden nodeswere decided according to the initial training data and did not change in subsequentmodel updates because of limiting computational resources, resulting in fixed modelcomplexity. As new data arrive, information on larger term variability, e.g. interannualor interdecadal variability, becomes available, but the fixed model complexity wouldnot have the capacity to learn the additional structure in the data. If model complexitycan be changed during the updating process, this may enhance the prediction skills.86BibliographyAnselmo, D., Moran, M. D., Me´nard, S., Bouchet, V. S., Makar, P. A., Gong, W.,Kallaur, A., Beaulieu, P., Landry, H., Stroud, C., Huang, P., Gong, S., and Talbot,D. (2010). A new Canadian air quality forecast model: GEM-MACH15. In 12thConference on Atmospheric Chemistry, Atlanta, GA. American MeteorologicalSociety, Boston, 6 pp.Antonopoulos, S., Bourgouin, P., Montpetit, J., , and Croteau, G. (2012). ForecastingO3, PM25 and NO2 hourly spot concentrations using an updatable mos method-ology. In Steyn, D. G. and Castelli, S. T., editors, Air Pollution Modeling andits Application XXI, chapter 53. NATO Science for Peace and Security Series C:Environmental Security.Ballester, E. B., Valls, G. C., Carrasco-Rodriguez, J. L., Olivas, E. S., and Valle-Tascon,S. (2002). Effective 1-day ahead prediction of hourly surface ozone concentrationsin eastern Spain using linear models and neural networks. Ecological Modeling,156(1):27–41.Banerjee, T. and Srivastava, R. K. (2009). Evaluation of environmental impacts of IIE-Pantnagar through application of air and water quality indices. EnvironmentalMonitoring and Assessment, 172(1-4):547–560.Basheer, A. and Najjar, M. (1996). Predicting dynamic response of adsorption columnswith neural nets. Journal of Computing in Civil Engineering, 10(1):211–220.Benson, P. E. (1984). CALINE4—a dispersion model for predicting air pollutant con-centrations near roadways. Technical Report FHWA/CA/TL-84/15, FHWA UserGuide, Trinity Consultants Inc., USA.Bloomfield, P., Royle, J. A., Steinberg, L. J., and Yang, Q. (1996). Accounting formeteorological effects in measuring urban ozone levels and trends. AtmosphericEnvironment, 30:3067–3077.87BIBLIOGRAPHYBoznar, M., Lesjak, M., and Mlakar, P. (1993). A neural-network-based method forshort-term predictions of ambient SO2 concentration in highly polluted industrial-areas of complex terrain. Atmospheric Environment Part B-Urban Atmosphere,27(2):221–230.Brauer, M. and Brook, J. R. (1997). Ozone personal exposures and health effectsfor selected groups residing in the Fraser Valley. Atmospheric Environment,31(14):2113–2121.Breiman, L. (1996). Bagging predictions. Machine Learning, 24:123–140.Cannon, A. J. (2012a). monmlp: Monotone multi-layer percep-tron neural network. R package version 1.1.2. https://cran.r-project.org/web/packages/monmlp/index.html.Cannon, A. J. (2012b). Neural networks for probabilistic environmental prediction:Conditional density estimation network creation and evaluation (CaDENCE) inR. Computers and Geosciences, 41:126–135.Cavazos, T. (1997). Downscaling large-scale circulation to local winter rainfall in north-eastern Mexico. International Journal of Climatology, 17(10):1069–1082.Chaloulakou, A., Assimakopoulos, D., and Kekkas, T. (1999). Forecasting daily max-imum ozone concentrations in the Athens basin. Environmental Monitoring andAssessment, 56:97–112.Chen, H. and Copes, R. (2013). Review of Air Quality Index and Air Quality HealthIndex. Ontario Agency for Health Protection and Promotion (Public Health On-tario), Toronto, ON: Queen’s Printer for Ontario.Chen, S. T., Yu, P. S., and Tang, Y. H. (2010). Statistical downscaling of daily pre-cipitation using support vector machines and multivariate analysis. Journal ofHydrology, 385:13–22.Chen, T. M., Gokhale, J., Shofer, S., and Kuschner, W. G. (2007). Outdoor airpollution: ozone health effects. The American journal of the medical sciences,333(4):244–248.Colvile, R., Woodfield, N., Carruthers, D., Fisher, B., Rickard, A., Neville, S., andHughes, A. (2002). Uncertainty in dispersion modeling and urban air qualitymapping. Environmental Science and Policy, 5(3):207–220.88BIBLIOGRAPHYComrie, A. (1997). Comparing neural networks and regression models for ozone fore-casting. Journal of Air and Waste Management, 47:653–663.Demuzere, M., Trigo, R. M., Arellano, J. Vila-Guerau de, and van Lipzig, N. P. M.(2009). The impact of weather and atmospheric circulation on O3 and PM10 levelsat a rural mid-latitude site. Chemical Physics, 9(8):2695–2714.Dennis, R. L., Byun, D. W., Novak, J. H., Galluppi, K. J., Coats, C. J., and Vouk,M. A. (1996). The next generation of integrated air quality models: EPA’s models-3. Atmospheric Environment, 30(12):1925–1938.Dominick, D., Latif, M. T., Juahir, H., Aris, A. Z., and Zain, S. M. (2012). Anassessment of influence of meteorological factors on PM10 and NO2 at selectedstations in Malaysia. Sustainable Environment Research, 22(5):305–315.EPA (2005). Six common air pollutants. U. S. environmental protection agency.http://www.epa.gov/air/urbanair/6 poll.html.Feng, G., Huang, G. B., Lin, Q., and R., G. (2009). Error minimized extreme learningmachine with growth of hidden nodes and incremental learning. IEEE Trans NeuralNetw, 20(8):1352–1357.Flemming, J., Reimer, E., and Stem, R. (2001). Long term evaluation of the ozoneforecast by an Eulerian model. Physics and Chemistry of the Earth, 26:775–779.Gardner, M. W. and Dorling, S. R. (1998). Artificial neural networks (the multilayerperceptron). Atmospheric Environment, 32:2627–2636.Gardner, M. W. and Dorling, S. R. (1999). Neural network modeling and predictionof hourly NOx and NO2 concentrations in urban air in london. AtmosphericEnvironment, 33:709–719.Gardner, M. W. and Dorling, S. R. (2000). Statistical surface ozone models: an im-proved methodology to account for non-linear behavior. Atmospheric Environment,34(1):21–34.Golub, G. H., Heath, M., and Wahba, G. (1979). Generalized cross-validation as amethod for choosing a good ridge parameter. Technometrics, 21:215–223.Haber, E. and Oldenburg, D. (2000). A GCV based method for nonlinear ill-posedproblems. Computational Geoscience, 4:41–63.89BIBLIOGRAPHYHadjiiski, L. and Hopke, P. K. (2000). Application of artificial neural network tomodeling and prediction of ambient ozone concentrations. Journal of Air andWaste Management Association, 50:894–901.Handoko, S. D., Keong, K. C., Soon, O. Y., Zhang, G. L., and Brusic, V. (2006).Extreme learning machine for predicting HLA-Peptide binding. Lecture Notes inComputer Science, 3973:716–721.Hansen, D. A., Dennis, R. L., Ebel, A., Hanna, S. R., Kaye, J., and Thuillier, R. (28).The quest for an advanced regional air quality model. Environmental Science andTechnology, 2:71–77.Heeswijk, M. V., Miche, Y., Lindh-Knuutila, T., Hilbers, P. A., Honkela, T., Oja, E.,and Lendasse, A. (2009). Adaptive ensemble models of extreme learning machinesfor time series prediction. Lecture Notes in Computer Science, 5769:305–314.Hewitson, B. C. and Crane, R. G. (1996). Climate downscaling: Techniques andapplication. Climate Research, 7:85–95.Holmes, N. S. and Morawska, L. (2006). A review of dispersion modeling and its ap-plication to the dispersion of particles: An overview of different dispersion modelsavailable. Atmospheric Environment, 40(30):5902–5928.Hsieh, W. W. (2009). Machine Learning Methods in the Environmental Sciences: Neu-ral Networks and Kernels. Cambridge University Press.Hsieh, W. W. and Tang, B. (1998). Applying neural network models to predictionand data analysis in meteorology and oceanography. Bulletin of the AmericanMeteorological Society, 79:1855–1870.Huang, G. B. and Chen, L. (2007). Convex incremental extreme learning machine.Neurocomputing, 70:3056–3062.Huang, G. B. and Chen, L. (2008). Enhanced random search based incremental extremelearning machine. Neurocomputing, 71(3460-3468).Huang, G. B., Chen, L., and Siew, C. K. (2006a). Universal approximation usingincremental constructive feedforward networks with random hidden nodes. IEEETransactions on Neural Networks, 17(4):879–892.Huang, G. B., Wang, D., and Lan, Y. (2011). Extreme learning machines: a survey.International Journal of Machine Learning and Cybernetics, 2:107–122.90BIBLIOGRAPHYHuang, G. B., Zhu, Q. Y., and Siew, C. K. (2006b). Extreme learning machine: Theoryand applications. Neurocomputing, 70(1-3):489–501.Kim, K.-H., Jahan, S. A., and Kabir, E. (2013). A review on human health perspectiveof air pollution with respect to allergies and asthma. Environment International,59:41–52.Kolehmainen, M., Martikainen, H., and Ruuskanen, J. (2001). Neural networks andperiodic components used in air quality forecasting. Atmospheric Environment,35(5):815–825.Kukkonen, J., Partanen, L., Karppinen, A., Ruuskanen, J., Junninen, H., Kolehmainen,M., Niska, H., Dorling, S., Chatterton, T., Foxall, R., and Cawley, G. (2003).Extensive evaluation of neural network models for the prediction of NO2 and PM10concentrations, compared with a deterministic modeling system and measurementsin central Helsinki. Atmospheric Environment, 37(32):4549–4550.Kumar, A. and Goyal, P. (2013). Forecasting of air quality in Delhi using principalcomponent regression technique. Pure and Applied Geophysics, 170:711–722.Kurt, A. and Oktay, A, B. (2010). Forecasting air pollutant indicator levels withgeographic models 3 days in advance using neural networks. Expert systems withApplication, 37:7986–7992.Lal, B. and Tripathy, S. S. (2012). Prediction of dust concentration in open cast coalmine using artificial neural network. Atmospheric Pollution Research, 3:211–218.Liang, N. Y., Huang, G. B., Saratchandran, P., and Sundararajan, N. (2006). A fastand accurate on-line sequential learning algorithm for feedforward networks. IEEETransactions on Neural Networks, 17(6):1411–1423.Lima, A. R., Cannon, A. J., and Hsieh, W. W. (2015). Nonlinear regression in envi-ronmental sciences using extreme learning machines: A comparative evaluation.Environmental Modelling and Software, 73:175–188.Lu, W. Z. and Wang, W. J. (2005). Potential assessment of the ‘support vector’ machinemethod in forecasting ambient air pollution trends. Chemosphere, 59(5):693–701.Luecken, D. J., Hutzell, W. T., and Gipson, G. L. (2006). Development and analysisof air quality modeling simulations for hazardous air pollutants. AtmosphericEnvironment, 40(26):5087–5096.91BIBLIOGRAPHYMahfouf, J.-F., Brasnett, B., and Gagnon, S. (2007). A Canadian precipitation analysis(CaPA) project: description and preliminary results. Atmosphere-Ocean, 45:1–17.Maier, H. R., Jain, A., Dandy, G. C., and Sudheer, K. (2010). Methods used for thedevelopment of neural networks for the prediction of water resource variables inriver systems: Current status and future directions. Environmental Modeling andSoftware, 25:891–909.Marzban, C. and Stumpf, G. (1996). A neural network for tornado prediction basedon Doppler radar-derived attributes. Journal of Applied Meteorology, 35:617–626.McGranahan, G. and Murray, F., editors (2003). Air Pollution and Health in RapidlyDeveloping Countries. Earthscan Publications Ltd.Milford, J. B., Russell, A. G., and McRae, G. J. (1989). A new approach to photo-chemical pollution control: Implications of spatial patterns in pollutant responsesto reductions in nitrogen oxides and reactive organic gas emissions. EnvironmentalScience and Technology, 23(10):1290–1301.Moran, M. D., S. Me´nard, R. P., Anselmo, D., Antonopoulos, S., Makar, P. A., Gong,W., Gravel, S., Stroud, C., Zhang, J., Zheng, Q., Robichaud, A., Landry, H.,Beaulieu, P., Gilbert, S., Chen, J., and Kallaur, A. (2014). Recent advances incanada’s national operational aq forecasting system. In Steyn, D. G., Builtjes,P. J. H., and Timmermans, R. M. A., editors, Air Pollution Modeling and itsApplication XXII, chapter 37. NATO Science for Peace and Security Series C:Environmental Security.Nejadkoorki, F. and Baroutian, S. (2012). Forecasting extreme PM10 concentrationsusing artificial neural networks. International Journal of Environmental Research,6:277–284.Nguyen, D. (2014). A brief review of air quality models and their applications. OpenJournal of Atmospheric and Climate Change, 1(2):60–80.Nordiska, M., editor (2008). Interaction between climate change, air pollution andrelated impacts. Nordic Council of Ministers’ publishing house.Oke, T. R., Cleugh, H. A., Grimmond, S., Schmid, H. P., and Roth, M. (1989). Eval-uation of spatially averaged fluxes of heat, mass and momentum in the urbanboundary layer. Weather Climate, 9:14–21.92BIBLIOGRAPHYParviainen, E. and Riihimaki, J. (2013). A connection between extreme learning ma-chine and neural network kernel. Knowledge Discovery, Knowledge Engineeringand Knowledge Management Communications in Computer and Information Sci-ence, 272:122–135.Perez, P. (2001). Prediction of sulfur dioxide concentrations at a site near downtownSantiago, Chile. Atmospheric Environment, 35:4929–4935.Perez, P., Trier, A., and Reyes, J. (2000). Prediction of PM2.5 concentrations severalhours in advance using neural networks in Santiago, Chile. Atmospheric Environ-ment, 34:1189–1196.Prybutok, V. R., Yi, J., and Mitchell, D. (2000). Comparison of neural network modelwith ARIMA and regression models for prediction of Houston’s daily maximumozone concentrations. European Journal of Operational Research, 122:31–40.R Development Core Team (2011). R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing. Vienna, Austria. ISBN 3-900051-07-0.Ramanathan, V. and Feng, Y. (2009). Air pollution, greenhouse gases and climatechange: Global and regional perspectives. Atmospheric Environment, 43:37–50.Rasouli, K., Hsieh, W. W., and Cannon, A. J. (2012). Daily stream flow forecasting bymachine learning methods with weather and climate inputs. Journal of Hydrology,414:284–293.Reich, S. L., Gomez, D. R., and Dawidowski, L. E. (1999). Artificial neural networkfor the identification of unknown air pollution sources. Atmospheric Environment,33(18):3045–3052.Revlett, G. (1978). Ozone forecasting using empirical modeling. Journal of Air Pollu-tion Control Association, 28:338–343.Roadknight, C. M., Balls, G. R., Mills, G. E., and Palmer, B. D. (1997). Modelingcomplex environmental data. IEEE Transactions on Neural Networks, 8(4):852–861.Romero, E. and Alque˜zar, R. (2012). Comparing error minimized extreme learningmachines and support vector sequential feed-forward neural networks. Neural Net-works, 25:122–129.93BIBLIOGRAPHYRong, H. J., Ong, Y. S., Tan, A. H., and Zhu, Z. (2008). A fast pruned-extreme learningmachine for classification problem. Neurocomputing, 72:359–366.Roth, M. (2000). Review of atmospheric turbulence over cities. Quarterly Journal ofthe Royal Meteorological Society, 126(564):942–990.Russell, A. and Dennis, R. (2000). Narsto critical review of photochemical models andmodeling. Atmospheric Environment, 34(12):2283–2324.Schmidt, W. F., Kraaijveld, M. A., and Duin, R. P. W. (1992). Feed forward neuralnetworks with random weights. In 11th IAPR International Conference on PatternRecognition, Proceedings, Vol Ii: Conference B: Pattern Recognition Methodologyand Systems, pages 1–4.Seinfeld, J. H. and Pandis, S. N. (1997). Atmospheric Chemistry and Physics from AirPollution to Climate Change. Wiley-Interscience.Song, X. H. and Hopke, P. K. (1996). Solving the chemical mass balance problem usingan artificial neural network. Environmental Science and Technology, 30(2):531–535.Sun, Z. L., Choi, T. M., Au, K. F., and Yu, Y. (2008). Sales forecasting using extremelearning machine with applications in fashion retailing. Decision Support Systems,46(1):411–419.Tai, P. K. A. P. K. (2012). Impact of Climate Change on Fine Particulate Matter AirQuality. PhD thesis, Harvard University.Thimm, G. and Fiesler, E. (1997). High-order and multilayer perceptron initialization.IEEE Transactions on Neural Networks, 8:349–359.Thirumalaiah, K. and Deo, M. C. (1998). River stage forecasting using artificial neuralnetworks. Journal of Hydrologic Engineering, 3(1):26–32.Tripathi, S., Srinivas, V., and Nanjundiah, R. S. (2006). Downscaling of precipitationfor climate change scenarios: A support vector machine approach. Journal ofHydrology, 330:621–640.Wallace, J. and Kanaroglou, P. (2008). Modeling NOx and NO2 emissions from mobilesources: A case study for Hamilton, Ontario, Canada. Transportation ResearchPart D: Transport and Environment, 13(5):323–333.94BIBLIOGRAPHYWalter, A., Denhard, M., and Schonwiese, C.-D. (1998). Simulation of global and hemi-spheric temperature variations and signal detection studies using neural networks.Meteorologische Zeitschrift, N.F.7:171–180.Wang, W., Xu, Z., and Lu, J. W. (2003). Three improved neural network models forair quality forecasting. Engineering Computations, 20(2):192–210.Wilson, L. J. and Valle´e, M. (2002). The Canadian updateable model output statistics(UMOS) system: design and development tests. Weather Forecast, 17:206–222.Wilson, L. J. and Valle´e, M. (2003). The Canadian updateable model output statistics(UMOS) system: validation against perfect prog. Weather Forecast, 18:288–302.Wolff, G. T. and Lioy, P. J. (1978). An empirical model for forecasting maximum dailyozone levels in the northeastern U.S. Journal of Air Pollution Control Association,28:5087–5096.Wotawa, G., Stohl, A., and Neininger, B. (1998). The urban plume of vienna: Com-parisons between aircraft measurements and photochemical model results. Atmo-spheric Environment, 32:2479–2489.Yeu, C. T., Lim, M. H., Huang, G. B., Agarwal, A., and Ong, Y. S. (2006). A new ma-chine learning paradigm for terrain reconstruction. IEEE Geoscience and RemoteSensing Letters, 3(3):382–386.Yi, J. and Prybutok, V. R. (2002). A neural network model forecasting for predictionof daily maximum ozone concentration in an industrialized urban area. Environ-mental Pollution, 92(3):349–357.Yura, E. A., Kear, T., and Niemeier, D. (2007). Using CALINE dispersion to assessvehicular PM2.5 emissions. Atmospheric Environment, 41(38):8747–8757.Yuval (2000). Neural network training for prediction of climatological time series;regularized by minimization of the generalized cross validation function. MonthlyWeather Review, 128:1456–1473.Yuval (2001). Enhancement and error estimation of neural network prediction of ElNin˜o - 3.4 SST anomalies. Journal of Climate, 14:2150–2163.95


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items