TRANSFERRABILITY OF COMMUNITY-BASED MACRO-LEVEL COLLISION PREDICTION MODELS FOR USE IN ROAD SAFETY PLANNING APPLICATIONS by Bidoura Khondaker B.Sc. in Civil Engineering, Bangladesh University of Engineering & Technology, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Civil Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2008 © Bidoura Khondaker. 2008 ABSTRACT This thesis proposes the methodology and guidelines for community-based macro-level CPM transferability to do road safety planning applications, with models developed in one spatial- temporal region being capable of used in a different spatial-temporal region. In doing this. the macro-level CPMs developed for the Greater Vancouver Regional District (GVRD) by Lovegrove and Sayed (2006, 2007) was used in a model transferability study. Using those models from GVRD and data from Central Okanagan Regional District (CORD), in the Province of British Columbia. Canada. a transferability test has been conducted that involved recalibration of the 1996 GVRD models to Kelowna, in 2003 context. The case study was carried out in three parts. First, macro-level CPMs for the City of Kelowna were developed using 2003 data following the research by GVRD CPM development and use. Next, the 1996 GVRD models were recalibrated to see whether they could yield reliable prediction of the safety estimates for Kelowna, in 2003 context. Finally, a comparison between the results of Kelownas own developed models and the transferred models was conducted to determine which models yielded better results. The results of the transferability study revealed that macro-level CPM transferability was possible and no more complicated than micro-level CPM transferability. To facilitate the development of reliable community-based, macro-level collision prediction models, it was recommended that CPMs be transferred rather than developed from scratch whenever and wherever communities lack sufficient data of adequate quality. Therefore, the transferability guidelines in this research, together with their application in the case studies, have been offered as a contribution towards model transferability to do road safety planning applications, with models developed in one spatial-temporal region being capable of used in a different spatial-temporal region. 11 TABLE OF CONTENTS ABSTRACT ii TABLE OF CONTENTS iii LIST OF TABLES vii LIST OF FIGURES viii ACKNOWLEDGEMENTS ix 1. INTRODUCTION 1.1 Background 1 1.2 Transferability of Community-Based Macro-Level CPMs 2 1 .3 Objectives of the Research 3 1.4 Structure of the Thesis 4 2. LITERATURE REVIEW 5 2.1 Introduction 5 2.2 Traditional Road Safety Improvement Program 5 2.3 Proactive Road Safety Improvement Program 6 2.4 Development of Macro level CPMs 8 2.4.1 Regression Technique 8 2.4.2 Model Form 9 2.4.3 The GUM Process 10 2.4.4 Selection of Explanatory Variables 11 2.4.5 Assessment of Model Goodness of Fit 12 2.4.6 Performing Outlier Analysis 13 2.5 Previous Work on Community-Based Macro-Level Collision Prediction Models 14 2.6 Community-Based Macro level CPM Research by Lovegrove (2006, 2007) 17 2.6.1 Stratification of Variables 17 2.6.2 Screening of Variables 18 2.6.3 Macro-Level Model Development 20 111 2.6.4 Model Development Results.20 2.7 Model Transferability across Space and Time 21 2.7.1 Previous Work on Model Transferability 21 2.7.2 Shortcomings of Previous Work on Transferability of Models 25 2.7.3 Two Alternative Methods for Model Transferability 25 2.7.4 Macro-Level CPM Transferability by Maximum Likelihood Procedure 26 2.8 Summary 27 3. METHODOLOGY FOR DATA AGGREGATION & TRANSFERABILITY OF MODELS 29 3.1 Introduction 29 3.2 Geographic Scope of Data 29 3.3 Aggregation Units 31 3.4 Variable Definitions 33 3.5 Selection of Candidate Variables 35 3.6 Stratification of Variables 36 3.7 Aggregation of Each Candidate Variable 39 3.7.1 Exposure Variables 39 3.7.2 Socio-Demographic (S-D) Variables 41 3.7.3 Network Variables 43 3.7.4 Transportation Demand Management (TDM) Variables 46 3.7.5 Collision Variables 47 3.8 Development of Macro-Level CPMs 49 3.8.1 Background 50 3.8.2 Model Groupings 50 3.8.3 Model Form 52 3.8.4 Model Development 52 3.8.5 Assessing Model Goodness of Fit 53 3.8.6 Performing Outlier Analysis 53 3.9 Macro-Level CPM Transferability 54 3.10 Summary 56 iv 4. MODEL DEVELOPMENT AND TRANSFERABILITY RESULTS 57 4.1 Introduction 57 4.2 Development of Macro-Level CPMs: GVRD vs. Kelowna 57 4.2.1 Urban Models 57 4.2.2 Rural Models 64 4.3 Transferred Models 70 4.3.1 Urban Models 70 4.3.2 Rural Models 76 4.4 Transferred vs. Developed Models 82 4.4.1 Urban Models 83 4.4.2 Rural Models 83 4.5 Summary 84 5. CONCLUSIONS AND RECOMMENDATIONS 85 5.1 Introduction 85 5.2 Summary and Conclusions 85 5.3 Research Contributions 86 5.3.1 Macro-level CPM Transferability Between Time-Space Regions is Feasible and Successful 86 5.3.2 It is Beneficial to Transfer Models Rather Than Developing Own Models For Some Communities 87 5.3.3 Transferability is Desirable Whenever a Community Lacks Good Quality Data, Irrespective of Number of Available Data Points 87 5.3.4 Demonstration of the Validity of Some of the Transferred Models 87 5.3.5 The Value of Shape Parameter K of the Transferred Model is Strongly Influenced by the Sample Size 88 5.3.6 GIS Can be Implemented as a Useful Tool in Safety Analysis 88 5.4 Recommendations for Future Research 89 REFERENCES 91 V APPENDICES.96 APPENDIX A: GLIM4 OUTPUT SAMPLE FOR MODEL DEVELOPMENT 96 APPENDIX B: GLIM4 OUTPUT SAMPLE FOR MODEL RECALIBRATION 99 vi LIST OF TABLES Table 2.1 Candidate Variables — Collisions, Exposure, Socio-Demographic 19 Table 2.2 Candidate Variables — Transportation Demand Management and Network 29 Table 3.1 Urban and Rural Classes According to Population Density 34 Table 3.2 Candidate Variables — Collisions, Exposure, S-D. TDM, Network 37 Table 3.3 Incident Severity as Defined by ICBC 47 Table 3.4 Model Groups 51 Table 4.1 Urban Models: GVRD vs. Kelowna 58 Table 4.2 Rural Models: GVRD vs. Kelowna 65 Table 4.3 Transferability Results for the Kelowna Urban Models 71 Table 4.4 Transferability Results for the Kelowna Rural Models 77 vii LIST OF FIGURES Figure 3.1 City of Kelowna 30 Figure 3.2 City Sectors in Kelowna 31 Figure 3.3 Kelowna TMODELTM2 TAZ boundary & Census Tract Boundary 33 Figure 3.4 Rural and Urban Zones in Kelowna 35 Figure 3.5 Road Network 40 Figure 3.6 Population Density in the City of Kelowna 42 Figure 3.7 Illustration of process of splitting the road network by traffic zone system 44 Figure 3.8 Snapshot of a GIS file for calculating Total Road Kilometre 45 Figure 3.9 Collision Density for Total Collision 48 Figure 3.10 Collision Density for Severe Collision 49 viii ACKNOWLEDGEMENTS I offer my enduring gratitude to my supervisor, Dr. Tarek Sayed for the technical support and advice that he extended to me during the course of this project. I specially thank Dr. Gordon Lovegrove, my co-supervisor, for enlarging my vision of engineering and providing coherent answers to my endless problems. My sincere thanks to Insurance Corporation of British Columbia (ICBC), Translink and The City of Kelowna Transportation Division for their data, without which this research would not have been possible. Finally, I would like to thank my husband Imran and my daughter Zara, who was born in this thesis birthing process, for their unconditional love, support and encouragement that they provided me in all my endeavours. ix CHAPTER 1 INTRODUCTION 1.1 Background Deaths, suffering, injuries and economic losses due to road collisions are a global calamity with ever-rising trend. The enormous social and economic costs associated with this unacceptably high number of road collisions have been recognized as one of the most serious concerns for many decades. Worldwide. each year nearly 1.2 million people die and millions more are injured or disabled as a result of this road collision (UN, 2007). The social and economic burden of road collisions in North America is also enormous. To address these serious and ever rising problems, there exist many traditional road safety improvement programs (RSIPs) which generally focus on the identification, diagnosis, and remedy of existing collision-prone locations or “black spots”. This approach is known to be reactive in nature and has been proved to be very successful. However, the application of this reactive approach requires that a considerable collision history must exist, which usually is associated with significant costs on the communities. Hence, several road safety researchers have recognized that a more proactive approach needs to be taken (de Leur and Sayed, 2003 & Lovegrove and Sayed, 2006) to address this problem. A proactive approach is one that addresses road safety explicitly and focuses on predicting and improving the safety of planned facilities. The main goal of the proactive road safety approach is to evaluate safety throughout each stage of the planning process and thereby minimize the road safety risk and also prevent black spots from occurring. However, a major barrier to this proactive approach has been the lack of necessary empirical tools to evaluate road safety from a macro-level or planning perspective. Recently, Lovegrove and Sayed (2006) and Lovegrove (2006, 2007) developed and used community-based macro-level collision prediction models (CPMs) in proactive road safety improvement programs for the Greater Vancouver Regional District (GVRD). The research by Lovegrove (2006, 2007) was successful in developing macro-level CPMs and describing the model use guidelines that provided a safety planning decision support tool to community planners and engineers. While their research provided enough evidence of using macro-level CPMs as a reliable empirical tool, there are some issues regarding inconsistencies in developing such models. The most important one is the lack of availability and quality of data. To develop macro-level CPMs, a statistically significant sample size of quality data must be available, which may not be available for many communities. Also, some communities may not have a central, geo-coded database and may have to rely on sporadic and less reliable data, which in turn puts model reliability in question. A recent technique to deal with this issue is to focus on model transferability. Instead of developing their own models, it is desirable if models developed for one jurisdiction in one period of time could be applied for a different period in the same or another jurisdiction. Hence, the main goal of this thesis is to condLict a research on community-based macro-level CPM transferability across spatial-temporal regions. 1.2 Transferability of Community-Based Macro-Level CPMs Generally to develop community-based macro-level CPMs, extensive time, cost and effort is needed for data extraction, collection and analysis. Hence, if a model is well developed using good quality data and based on sound methodological background in one context, it is cost effective to transfer such a model to another context that have limited data available. There are two aspects of CPM transferability, spatial and temporal. The first aspect involves the application of a model estimated for use in one jurisdiction to be used for predicting safety levels in a different jurisdiction for the same period of time. The second aspect results from the application of a model estimated in one time period to be used for prediction at another time period but for the same geographic region. There can be a combination of these two aspects, which refers to the term spatial-temporal model transferability”, and is the main focus of this research. There exists several research efforts on spatial-temporal model transferability but most of them have involved trip generation models and aggregate and disaggregate mode choice models used in transportation planning models (Ben-Akiva, 1981 and Atherton and Ben-Akiva, 1976). However, little research have been conducted on 2 spatial-temporal transferability of community-based macro-level CPMs. Therefore, this research focuses on an effort to test whether community-based macro-level CPMs could produce a reliable safety estimates when transferred for use in different time periods and regions. The objectives of this thesis are described in the next section. 1.3 Objectives of the Research Lovegrove (2006, 2007) developed thirty-five community-based macro-level collision prediction models for the Greater Vancouver Regional District (GVRD) using generalized linear regression modeling (GLIM) techniques assuming a negative binomial error structure. This thesis documents the use of several of those recently developed macro-level CPMs in a model transferability study. Using data from neighbourhoods in the GVRD and the City of Kelowna in the Central Okanagan Regional District (CORD), in the Province of British Columbia (BC), Canada, the main aim of this research is to conduct a study that involved models developed in the GVRD, using 1996 data, being calibrated for use in Kelowna using 2003 data. The specific objectives of this research are as follows: I. First, to develop community-based macro-level CPMs for the city of Kelowna using 2003 data, that followed the methodology described by Lovegrove & Sayed (2006) on GVRD macro-level CPM development and use. 2. Second, to use several of the recently developed GVRD macro-level CPMs in a spatial-temporal CPM transferability study. The study involved transferring the GVRD models in order to adapt them to the city of Kelowna, to see whether the recalibrated 1996 GVRD CPMs could reliably predict safety levels for Kelowna in a 2003 context. 3. Third, to compare the goodness of fit and reliability of these two new sets of community-based macro-level CPMs, one set developed from scratch’ using Kelowna 2003 data, and the other set using transferred models from 1996 GVRD to 2003 Kelowna. 4. Finally, to make recommendation on when to transfer and when to develop models for application involving differing spatial-temporal regions. 3 Thus, the research proposes methodology and guidelines for model transferability to do road safety planning applications, with models developed in one spatial-temporal region being capable of used in a different spatial-temporal region. 1.4 Structure of the Thesis This thesis is divided into five chapters. Chapter one provides a brief introduction of the thesis including background, research objectives and thesis structure. Chapter two presents a comprehensive literature review of research on development of community-based macro- level CPMs, including spatial-temporal transferability of CPMs. Chapter three describes the methodology used for this research on data extraction and aggregation approach for macro model development. This chapter concludes with describing the development and transferring of CPMs using those aggregated data. Chapter four provides and analyses of the resulting models and their transferability results including goodness of fit of the models. This chapter concludes with a discussion on the comparison between the developed and transferred models. Finally, chapter five summarizes the research effort along with conclusions and recommendation for future research work. 4 CHAPTER 2 LITERATURE REVIEW 2.1 Introduction This chapter presents a comprehensive literature review to describe theoretical background of the research and previous work in the area of model transferability. The chapter is composed of five main sections. In section 2.2, traditional reactive road safety improvement programs are reviewed. In section 2.3, emerging proactive road safety improvement programs are discussed, including road safety audits, CPMs, sustainable road safety and road safety planning guidelines. The methodological issues in macro-level CPM development are described in section 2.4. Section 2.5 describes several studies on community-based macro- level collision prediction models and their potential as an improved empirical tool in safety planning applications. Section 2.6 details a recent research effort in this area by Lovegrove and Sayed (2006) and Lovegrove (2006, 2007), which is the main focus of this thesis. Finally section 2.7 reviews the issues of CPM transferability across spatial-temporal regions. 2.2 Traditional Road Safety Improvement Program Road safety has been increasingly regarded as one of the most important transportation issues. The social and economic cost associated with road collisions in North America is enormous. To address this serious and ever-rising problem, the majority of road authorities have initiated and established Road Safety Improvement Programs (RSIPs). The traditional RSIPs generally focus on: I). Identification ii) Diagnosis and iii) Remedy of existing collision-prone locations or “black spots”. This traditional reactive engineering approach has been very effective in identifying and treating hazardous locations. However, there are some limitations associated with this traditional approach. First, this program is reactive in nature such that considerable collision history must exist for an action to take place. Second, 5 collision data for this reactive approach may not be of good quality affecting the outcome of the program. These shortcomings associated with the traditional approach suggests that it is more appropriate if safety can be incorporated in the planning process so that safety can be evaluated by detecting potential problems and revise designs, if necessary, before the start of construction. Therefore, the need to explicitly consider road safety proactively in the planning process has been gaining popularity to road safety researchers. The next section describes this recently emerged proactive road safety improvement program in greater level of details. 2.3 Proactive Road Safety Improvement Program In recent years, the consideration of road safety in regional transportation planning has emerged as a strategic direction for improving the overall safety of the transportation network. Consideration of road safety in the planning level has been accomplished by introducing the concept of safety conscious planning. The major goal of safety conscious planning is that safety should be routinely, comprehensively and effectively incorporated throughout each stage of the planning process and thereby minimizing the road safety risk and also preventing hazardous locations from occurring. Therefore, the proactive approach has emerged as a strategy for improving safety in recent years. However, this approach can only be effective if supported by reliable empirical tools. Despite the fact that there exist sufficient reliable empirical tools for the traditional reactive road safety engineering approach, road safety as a proactive tool is at a relatively early stage of development and sometimes not considered as reliable (de Leur & Sayed, 2003; Herbel, 2004). The need for a reliable tool to incorporate safety in planning was also felt by the National Cooperative Highway Research Program (NCHRP) who commissioned a new project (NCHRP Project 8- 44: Incorporating Safety into Long-Range Transportation Planning) in 2005. The objective of this project was to develop a guidebook for practitioners that will enable them to identify and evaluate alternative ways to more effectively incorporate safety in long range planning and decision making process. While there exists a lack of empirical tools for proactive approach. still they are needed by the planners and engineers to estimate the level of safety of planned projects, design changes, if any, to those projects, and for other proposed safety improvement 6 programs. Currently, there are a few developments in some of the proactive empirical tools such as, road safety audits. CPMs. Sustainable Road Safety programs, and Road Safety Risk Indices which has been briefly discussed in the following paragraphs. Road Safety Audits (RSAs) can be employed reactively to an existing location that has been identified as hazardous or proactively in road design and planning. Micro-level CPMs can only aid RSAs involving a single site, where exposure is known. However, more emphasis was needed in its application to develop planning level strategies. Regarding CPMs, several efforts (Ho & Guarnaschelli, 1998; Lord & Persaud, 2004) have been made to combine micro level CPMs with regional transportation planning model (e.g. Ernme/2) to do a proactive planning level analysis. However, these efforts were unsuccessful as the micro level CPMs can predict the level of safety in a single site, and, traffic forecasts at any location derived from long-term planning-level analyses were inaccurate. Hence, the necessity of macro level CPMs were felt and research was initiated to develop macro-level CPMs in mid-1990s. The Dutch Sustainable Road Safety (SRS) program was initiated in 1986 which was a community-based proactive road safety strategy for reducing severe collisions (Wegman, 1996; van Schagen & Janssen, 2000). The Dutch SRS program was significantly’ based on Transportation demand management (TDM) which was considered an effective SRS strategy because its major aim was to reduce traffic volumes, and consequently, collisions. However. all SRS program forecasts were based on linear exposure-collision relationship which was an erroneous assumption. While many studies (Poppe, 1995, l997a, 1997b) was in progress in search of an improved empirical tools, including development of a safety’ and environmental impact module that combines with a traditional regional transportation planning model. no attempts were taken to use these tools in the development of a macro level CPM. In North America the Road Safety Planning Framework (RSPF) and associated Road Safety Risk Index (RSRI) guidelines developed by de Leur & Sayed (2002) forms the basis for development of macro-level CPMs which facilitates the proactive road safety planning 7 approach. De Leur and Sayed (2002) recommended the use of the road safety assessment guidelines in order to quantify a RSRI for each planned or built facility. This RSRI is now approved as a valuable decision-aid tool by the MoTH Highway Safety Branch which generates consistent ratings independent of the observer and formulates appropriate road improvement strategies. However, the subjective nature in determining the RSRI suggests that additional modification necessary in this area to provide a planning level collision prediction model. Given the recognized need for proactive road safety planning tools and the paucity of relevant research in this area, the need for an improved tool is necessary. The weakness of the traditional approach must be overcome and there is a need to develop better tools and criteria for assessing the safety impacts of long-range transportation alternatives. Recent research and development on community-based macro-level CPMs for use in safety planning applications was a necessary first step in this regard. The following section reviews the procedures involved in macro-level CPM development. 2.4 Development of Macro-Level CPMs The purpose of this section is to review the procedure for macro-level CPM development which includes model regression technique, model forms, GLIM process, selection of explanatory variables, goodness of fit of the developed models and model refinement by outlier analysis. 2.4.1 Regression Technique There are many collision prediction models currently in use that uses conventional linear regression assuming a normal error structure for the response variable, a constant variance for the residuals and a linear relationship between the response and explanatory variables. However, the inappropriateness of the conventional linear regression technique for modeling discrete, nonnegative, and rare events like collisions has been well proven by researchers (Hauer et al. 1988, Miaou and Lum 1993). These researchers also suggested a Poisson or 8 negative binomial error structure while modeling collision frequency. Currently, generalized linear regression modeling (GLIM) is used extensively for the development of collision prediction models which has the advantage of overcoming the shortcomings of conventional linear regression technique. Several GLIM software packages are now available which can be used for modeling data that follow a wide range of probability distribution with exponential family such as normal, Poisson, binomial, negative binomial, gamma, and others. The GLIM software package also has the advantages of converting nonlinear model forms into linear forms through several built-in link functions. 2.4.2 Model Form Sawalha & Sayed (1999) recommended that the mathematical form for any CPM should satisfy two conditions, which are: (i) The model should yield logical result, meaning the model must not predict negative number of collisions and must ensure zero collision frequency for zero exposure, and (ii) There must exist a known link function that can linearize the model form for coefficient esti mati on. These two conditions are satisfied if such a model form is chosen which consists of the product of powers of the exposure measures multiplied by an exponential incorporating the remaining explanatory variables. This type of model form can also be linearized by logarithmic linking function. Therefore, the recommended model form for the expected collision frequency at intersection can be mathematically expressed as: E(A) = X JCli x Xei (2.1) And, recommended model form for the expected collisions frequency for road segments can be expressed as: / \ a a,EA)=a xL’xVxej (2.2) Where. 9 E(A) predicted collision frequency, V. V, =road section and intersection major/minor AADT, L=section length, any variable additional to L and V. and a,,,a1,a,,b, model parameters. Using a logarithmic linking function, the equation can be transformed into a log linear fbrrn and can be mathematically expressed as: Ln[E(A)] = Ln(a)+a1Ln()+a2Ln()+ (b, x,) (2.3) 2.4.3 The GLIM Process Once a model form has been chosen, the estimation of the models parameters can be carried out using several GLIM statistical software packages (e.g. GLIM4, SAS). The decision on whether to use a Poisson or negative binomial error structure is based on a methodology proposed by Bonneson and McCoy (1993). First the model parameters can be estimated assuming a Poisson error structure and a dispersion indicator is calculated using equation 2.4: Pearsony2 (2.4) n-p Where n is the number of observation. p is the number of model parameters and Pearson 2 is detned as: Pearsony2 —E(A)]2 (2.5) Var(y,) W here, = observed number of collisions at location i, E(A1)= predicted number of collisions at location i and, 10 Var (y,} Variance of the observed mean collision frequency at location i. Using above equation, if o exceeds one, it is considered that the data is over-dispersed and a negative binomial error distribution is assumed (Sayed & de Leur, 2001). Thus. once model form. logarithmic link function and an error distribution have been specified, the GLIM4 software estimates model parameter and value of over dispersion or shape parameter ic using one of the three available methods, namely, maximum likelihood approach, the expected value of the 2 statistics or the mean scale deviance method. However, several researchers (Sawalha & Sayed, 2005a and Miaou, 1996) have proved the superiority of maximum likelihood method for estimating K than the other methods. Hence, throughout this thesis, the maximum likelihood method was followed involving iterative process for estimation of ic. The next step in developing model is to select which explanatory variables should be included in the model development process. 2.4.4 Selection of Explanatory Variables For selecting explanatory variables while developing models. Sawaiha & Sayed (2005a) recommended a forward stepwise procedure in which variables are added to the model one by one with testing for their significance. The decision on whether a variable should be retained in the model is based on several criteria. First, the t-ratio of the added variable’s estimated coefficient must be significant at the 95% confidence level. Second, its logic (i.e. +/- sign) should be intuitive with expectations. Lastly, the addition of the variable should cause a significant drop in the Scaled Deviance (SD) at 95% level of confidence where the SD (if error structure is Poisson distributed) is defined as follows: SD2v ml (2.6) E(A1)J While if error structure follows the negative binomial distribution, the SD is as follows: SD =2r In —(v +K)Ifl( ‘ +K (2.7) [‘ E(A)) -, II The SD is asymptotically 2 distributed with n-p-i degrees of freedom. Therefore, a variable is retained if its addition causes a drop in SD exceeding %J051 . Hence, the selection of explanatory variables ends when the desired variables met the above mentioned criteria. 2.4.5 Assessment of Model Goodness of Fit There are several measures for assessing the goodness of fit of the GLIM models. The most commonly used measures are the Pearson2statistics and Scale Deviance (SD) defined in equation 2.5 and 2.7 respectively. For a model to be well fit. both the Pearsoii2 and SD statistics should be less than the 2 distribution value with n - p - 1) degrees of freedom at a 95% confidence level. There also exist several subjective measures of model goodness of fit. One such measure is the plot of the average of squared residLials (SR) versus the predicted collision frequency where the average of squared residuals is defined as: Average of SR = (2.8) n For a well fit model, all points should be around the variance function line for a negative binomial distribution where the variance function can be defined as: Var(y1)= E(y,)+ E(y) (2.9) Another subjective measure is the plot of the Pearson Residual (PR) versus the predicted collision frequency where the PR can be defined as: PR, = E(A1)—y (2.10) JJ’ir(y1) For a well fit model, PR, should be clustered around zero over the full range of predictions. In addition, the statistical significance of the variables in a model is assessed using t ratio, which is the ratio between the variable coefficient and its standard error. This t ratio is one of 12 the criteria for retaining a variable in the model, the critical value of which is 1 .96 at 5 percent level of significance. 2.4.6 Performing Outlier Analysis While developing models sometimes the dataset may contain few unusual or extreme observations called outliers. Outliers exist in dataset because they are different from the rest of the data or sometimes because of poor data collection and recording. The method for removing those extreme data points has been described by Sayed & Rodriguez (1999) and Sawalha & Sayed (2005a). They recommended a measure to describe the influence of this extreme data point which is called the Cook’s Distance (CD). Mathematically CD was defined as: CD h, r.1)2 (2.11) p(l—h,) Where: ,‘S is the standardized residual of data point i, where - - - PR1 (2.12) (1—h,)Var’,) h1 is the leverage value, Generally, data points with a high CD value are influential point in a given dataset. The procedure for removing outlier as described by Sawaiha & Sayed (2005a) can be summarized as follows: 1. First, data are sorted in descending order of magnitude according to CD values, 2. Data points having largest CD values are removed one by one and the drop in SD is noted after removing each point, 3. Data points that caused a significant drop in the SD are considered as influential outliers. 1-, I.) The above mentioned procedure for removing outlier is the last step in refining the overall model fit while developing CPMs. 2.5 Previous Work on Community-Based Macro-Level Collision Prediction Models There has been considerable research to develop models for predicting the number of collisions for individual elements of transportation network (Lord, D. 2000. Sawalha and Sayed, 2001). However, few of these studies attempted to developed models at a more aggregate zonal level that could be used as part of the transportation planning process. Levine et al. (1995) developed an aggregate model for Hawaii that relates zonal collisions to area, population, employment (manufacturing, retail, service, military and financial sectors) and miles of freeway segments, major arterials. and freeway ramps. But the model was assumed to be linear in parameters, which had already been proven as inappropriate for safety modeling, and exponential-type models had been preferred instead. Lord (2000) aimed at developing a tool that would allow the estimation of traffic collisions on computerized or digital transportation networks during the planning process. Two sample digital road networks were created with the help of a regional transportation planning model (EMME/2) and collision prediction model developed in that research were applied to that network. Although the results of the research showed that it was possible to predict the number of collisions on a digital network, the accuracy of prediction was directly related to the exactness of the traffic flow predictions from EMME/2 software programs. Of late, two attempts were made to develop macro-level CPMs using non-linear, non-Normal Generalized Linear Regression Modelling (GLIM) techniques with more promising results. The first attempt was made by Hadayeghi et al. (2003) who developed a series of macro-level collision prediction models for estimating the number of collisions in planning zones as a function of zonal characteristics. Using GLIM techniques, the macro-level CPMs predicted the mean collision frequency in traffic zones, with data aggregated across 463 traffic zones in Toronto, Canada. The geo-coded collision data were grouped by zone using GIS tools. The socioeconomic and demographic data was obtained from a 1996 regional survey and traffic 14 demand data was obtained from EMME/2 software package. The methodology used in this research was a significant deviation from traditional CPM development research and therefore rather than using data from an individual link or node, macro-level CPMs used data summed across all nodes and/or links in each zone, across an entire community or region. Finally, the choice of variables and model development followed the usual forward stepwise procedure. Explanatory variables included population, household, employment and vehicle ownership in socio-economic and demographic group; number of intersection, major minor road kilometre and zonal area in network group; vehicle-kilornetres-travelled (VKT), average zonal congestion (VC) and posted speed in traffic demand group. The resultant macro-level CPMs to predict either total or severe collisions for either all day or rush hour time periods was based on the following mathematical form: E(A)aoVKThe (2.13) Where: F (A) = Mean Collision Frequency; VKT = Zonal total of VKT from EMME/2; x, = Explanatory variables aggregated zonally (e.g. employment, population, intersections); a0. b0, b. = Model parameters; The results of this study concluded that, increasing total yearly collisions were associated with increased VKT, households, major road kilometres and intersection density, in accord with intuition. However, decreasing collisions were associated with increased average posted speed and zonal average VC. Possible explanation in this regard was offered that higher speed limits are posted in zones that are usually safer and zones where average V/C increases, becomes more congested and the traffic operates at lower speeds, thereby reduced number of collisions. Regarding morning peak hour collisions, the results showed that increasing collisions were associated with increasing total employed labour force and minor road kilometres, reflecting zones with more employed labour force tend to have more collisions in the morning peak period and zones with fewer roads were less likely to experience collisions than zones with more roads because of less exposure. However, 15 Hadayeghi et at. (2003) suggested that the models developed in this study were preliminary tools to incorporate road safety in the planning process and should not be used to assess traffic management strategies. Therefore, the research work needed many improvement, for example. inclusion of more variables such as, employment in manufacturing, retail trade and financial, land use and geometry of neighbourhood design, driver age, gender, road conditions, collision reporting practices and even police enforcement level. Ladron de Guevara et al. (2004) developed CPMs for the Tucson, Arizona region. Using demographic and economic data from the region’s 859 TAZs, CPMs were developed that associated increased collisions with increased population density, employment, intersection density and arterial and collector road. However the issue of causation or correlation of these variables to collisions is yet to be fully verified. Hadayeghi et al. (2006) did additional research on macro level CPM development. The purpose of the research work was to update and improve the previous research in which a series of models with limited number of explanatory variables and data were developed. In this case, some new variables were considered such as, the effects of specific land uses, different types of employment and the presence of transit facilities. The data used in this study were comprised of collision, socioeconomic, demographic and road network characteristics, as well as road traffic volumes for the City of Toronto’s 481 traffic zones. For testing the goodness of fit of the models, the Cumulative Residuals (CURE) method (Hauer, E., and Bamfo, J, 1997) was used in addition to other conventional goodness of fit measures. In CURE method the cumulative residuals of each explanatory variable are plotted against the dependent variables. The goal of using this method is to graphically observe how well the function fits the data. The cumulative residual plots for the covariate VKT for the total and severe collision models showed that severe collision model fits the data with better accuracy than the total collisions models. Other traffic variables such as average zonal posted speed, average zonal 85% operational speed and average zonal congestion were found to be insignificant in the total and severe collision models. The models revealed that collision frequency increases with the increase in the following explanatory variables: total arterial, collector, laneway, ramp kilometres, total road kilometres. number of signalized 16 intersections, number of 4-legged and 3-legged signalized intersections, number of schools in each zone and type of dwelling unit. However, other variables, total rail kilometres and total local road kilometres were found to be inversely associated with collision. This was explained by the fact that the operating speeds are usually lower for the local roads, therefore less collision. The study also attempted to develop several comprehensive models which use all of the above variables including land use, network, traffic intensity and socioeconomic and demographic variables. The comprehensive models were statistically proved to perform better than other models because of lower dispersion parameters. Therefore, it was concluded that the comprehensive models were better tools for the prediction of the number of zonal collisions. The research by Hadayeghi et al. (2006) was in essence similar to the research work by Lovegrove et al (2006, 2007) on community-based CPM development, where he developed macro-level CPMs for the Greater Vancouver regional district (GVRD) in BC, Canada. 1-le also developed model use guidelines and conducted several case studies on model use, including model transferability guidelines in same spatial temporal regions. Since the focus of this thesis is to conduct transferability study based on Lovegrove’s GVRD macro-level CPMs. hence further details of his research work will be discussed in section 2.6. 2.6 Macro level CPM Research by Lovegrove (2006, 2007) Lovegrove and Sayed (2006) developed macro-level CPMs using data for 577 neighbourhoods for the GVRD. In the study thirty five macro level CPMs were developed to estimate collisions for a three-year period. The following sections provide a brief detail on this research work including information on stratification and screening of variables, model development procedure along with model development results. 2.6.1 Stratification of Variables For the purpose of developing macro-level CPMs. data were stratified both for the dependent and independent variable sets. Stratification of data for the dependent (collision) variables 17 used the following divisions: collision type, collision severity, collision time period, and collision location. Stratification of explanatory variables was done in three levels. The first level included four themes such as exposure, Socio-Demographic (S-D). Transportation demand Management (TDM), and network variables. The second level of stratification was based on two types of exposure variables, either measured or modeled. Measured data consisted of information that was obtained either digitally (using GIS software tool) or manually (from land use, census data, and road maps). Modeled data consisted of traffic volume, speed, and congestion output from the GVRD’s Emme/2 transportation planning model. Since the characteristics of collisions vary between urban and rural areas, an urban or rural stratification was introduced as the third and final level of stratification for all variables. 2.6.2 Screening of Variables The variables screened for this research were over 220 which had been identified by a comprehensive literature search, those being in common use for transportation planning and road safety studies. Several screening criteria was used to identify the most practical variables as candidate for the models such as, (i) data should be easily available and cost effective, (ii) data should be accurate, (iii) they should be relatively easy to understand by practitioners, (iv) the explanatory variables had to be predictable, and (v) there need to be sufficient data point for each zone. Based on these criteria, scores were given to each of the 220 variables and an overall ranking identified top ranked 63 variables. Table 2.1 and 2.2 list the top-ranked 63 variables according to best fit screening criteria. 18 0 0 3 3 C D C D z CD CD CD CD n 0 0 CD 0< C D C D O 0 3 3 3 CD CD CD CD CD CD — _ . — 3 3 C D C D — w o w 3 CD - 0 3 0C D CD CD C D C D 3 C D ° I I C D 0 CD C - v z _ r v . 0 D CD 4D + — 0 CD 0- CD CD CD (C 0 0 0 0 C D CD CD C D ’ C D 0 C D 0 1 w — 0 0 C D D O ( CD 0 0 0 D C D Z 0 0 0 , O O C D O O < < CD D C D C D C D C D C D > ° ° ° C D 0 C D C D Q D tN N C D C D C D C C CD CD C D 1 7 D C C S C D : I C C o CD — . CD CD 3 - S CD ‘ 0 CD CD 0 . . CD 0 0 CD 0 0 0 0 ID C D -s ‘ C D 0 0 CD C C > 0 _ 0 - ’ 0 0 0 I + C I 0 a 0 0 0 CD 3 3 0 U, CD 0 3 - ‘ C z I I I I I I I I I I I I I z I I z 1 1 1 1 1 1 1 Z C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D 0 0 0 0 0 C D C D C D C D C D C D C D C D C D C D D t 0 0 0 J D t 0 0 0 J Q tD C D J O ,0 0 J D C D C D ) 0 ) 0 0 - 0 - 0 - 0 - a 0 J 0 0 0 0 J 0 0 0 0 J CD 3 3 3 3 0 0 a a a a a a a a a a a a a a a a a 0 . a a 0 - 0 - 0 - 0 - a 0 .0 .0 .0 .0 0 0 - 0 - 0 - 0 . H - — - 4 -I -C -I H -- C f l O f l O N ( C - S V C 0 0 0 C D C D C D C D C D C D C D C D U S C N C D C N 0 0 V C C D O C D tN tN tN — CD CD CD CD CD CD CD DC DC CD 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E C C 5 E E E E E E r, f l f l C r C (C (C C C (C ,, 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C D C D C D C D C D C D C D C D C D C D U , CD to to CD CD CD CD CD CD tO < CC CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD to to to to to CD to to to to to ® CD C D tO C D C D C D C D C D C D C D C D C D C D C D C D C D tO tO w 5 O C D C D C D C D C D C D 0) CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD — CD CD CD CD CD CD CD — CD tO CD CD CD CD CD CD CD CD 0 C D 0 CD C D CD < t J < C D C D < CD Cs ) CD CD C CD CD 0 CD CD CD CD N J to C ) CD N J • CD - — J a - — t J C D C D a C D z CD 0 0 N J . C D C D N J N N N CD N J N J 0 CD CD CD CD 03 CD CD N J N J C D NJ CD — - N J CD CD to to - to 0 a - - - - C D 0 - N N 0 > CD 0 CD - o > S t N J 0 N J N J - t CD o c a C CD C a a 0 N J CD C CD CD N J • CD 0 0 CD 0 2.6.3 Macro-Level Model Development Using the candidate variables, macro-level CPM development began on a group by group basis. Models were developed using generalized linear regression modelling (GLIM) technique assuming negative binomial error structure. The GLIM regression method followed micro-level CPM development closely. Mathematically, the CPMs were based on the following equation form which was considered more generic than that used by earlier researchers: E(A) aoZu (2.14) Where: E(A) = predicted collisions frequency; a0, a1, b = Model parameters; Z Exposure variables; Xi = Explanatory variables. Finally, selection of explanatory variables followed a forward stepwise procedure, as described in Sawalha & Sayed (2005a) and an outlier analysis using the Cook’s Distance (CD) technique was done to further refine the models. 2.6.4 Model Development Results The results of the developed models revealed that increased collisions were associated with increases in the following explanatory variables: (i) exposure related which are vehicle kilometres travelled (VKT). total lane kilometres (TLKM), and average zonal congestion (VC); (ii) TDM related which are shortcut capacity (SCC), shortcut attractiveness (SCVC), number of drivers (DRIVE), total commuters (TCM), and total commuter density (TCD); (iii) Network related which are signal density (SIGD), intersection density (INTD), arterial local intersection (IALP), and total arterial road lane kilometres (ALKP); (iv) S-D related which are job density (WKGD), population density (POPD), unemployment (UNEMP), and residential unit density (NHD). The models further revealed that decreased collisions were 20 associated with increases in the following variables: (i) Exposure related, none (ii) TDM related which are core size and percentage (CORE and CRP); (iii) Network related which are number of three way intersection (I3WP) and local road lane kilometres (LLKP): (iv) S-D related which is family size (FS). The study was successful in developing macro-level CPMs and describing the model use guidelines that provided a safety planning decision support tool to community planners and engineers. 2.7 Model Transferability across Space and Time Given the fact that the research by Lovegrove and Sayed (2006) provided enough evidence of using macro-level CPMs as part of RSIPs, however, there are several issues regarding inconsistencies in model development and lack of guidelines for their use. One such issue is the lack of availability and quality of data for developing planning level safety prediction models, which was discussed earlier in Chapter one. One approach to deal with this lack of data by some smaller communities is to focus on model transferability. The next section provides a detailed discussion on model transferability and issues related to this. 2.7.1 Previous Work on Model Transferability Several research works is in progress on how to transfer previously developed CPMs for use in regions and/or time periods other than the one for which they were developed. Vogt and Bared (1998) used Pearson statistic as a measure to test whether a certain negative binomial micro-level CPM that has been developed using one data set, can generate a reliable prediction for a totally different data set. He suggested if a CPM that is applied to a new dataset is correct and if the observations in the new data set are independent, then the expected value and the standard deviation of the Pearson statistic can be given by the following equations: 21 = N;u() = 2N(1 +3/K) + E(A,)[l +E(A,)/ K1 (2.15) Where. N = the number of data points used to re-calibrate the model. E(A) K s are from each re-calibrated CPM Then a new score was introduced called the “z” score which measures how far the calculated 2 is from its expected value and was given by: - E() (2.16) Values of “z” score close to zero suggests that the model was transferred successfully. However. (Vogt and Bared, 1998) didnt set any limit of the ‘z” value beyond which the model would be considered as non transferable. Harwood et al (2000) proposed a procedure for micro-level model recalibration in order to apply it in the Interactive Highway Safety Design Model (IHSDM) of the US Federal Highway Administration (FHWA). All highway agencies in the United States are supposed to use the IHSDM which is basically a road safety evaluation software for using in all types of highways in the Unites states. IHSDM first developed a set of base models using data from certain states. Then it developed a recalibration procedure that intended to adapt the base model to the local condition of any particular US highway jurisdiction, the purpose of which was to account for the safety differences between jurisdictions in different geographical areas. The safety differences may be due to differences in driver population, collision reporting threshold or even collision investigation practices. IHSDM’s recalibration procedure can be briefly summarized in two steps: (i) Using the related base model to obtain the safety estimates for a group of highway locations under the jurisdiction of a particular agency, and (ii) Multiplying the previously estimated safety estimates by a calibration factor. 22 The calibration factor was defined as the ratio of the sum of observed accidents to the sum of accidents predicted by the base model. Mathematically, (2.17) Where, = sum of observed collisions sum of collisions predicted by the base model. IHSDM procedure for the multiplication of the collisions predicted by the base model by the calibration factor (F) was basically equivalent to multiplying the base model constant by F (Sawalha and Sayed, 2005). Therefore, the IHSDM procedure was nothing but recalibrating the constants of the models before using them to obtain safety estimates in another jurisdictions. Persaud et al (2002) did a research effort that principally used the IHSDM procedure to recalibrate micro level CPMs for urban intersections. First, he developed micro-level collision prediction models for 3-legged and 4-legged signalized and unsignalized urban intersections for Toronto using 1990- 1995 Toronto data. Then attempt was made to recalibrate the British Columbia and California models for Toronto conditions using the above mentioned IHSDM procedure. The prediction accuracy of these artificially calibrated models using IHSDM procedure was than assessed relative to those models directly calibrated using Toronto data by employing two visual tests. The first test was plotting predictions from the three sets of models for selected minor AADTs and the second test involved examining plots of the cumulative residuals in accordance with the CURE” method suggested by Hauer & Bamfo (1997). The results of the micro model transferability test was mixed suggesting modification of the IHSDM procedure by employing calibration factor considering different traffic volume strata as opposed to the single factor proposed by Harwood et al (2000). 23 Hadayeghi et al (2005) for the very first time did a study on temporal transferability and updating of macro-level collision prediction models. The objective of this research work was twofold: first, to examine the temporal transferability of collision prediction models by using appropriate evaluation measures of predictive performance to assess whether the relationship between the dependent and independent variables holds reasonably well across time, and, second, to compare alternative updating methods of temporal transferability. Data for this research was based on city of Toronto’s 463 traffic zones for the year 1996 and 481 traffic zones fbr the year 2001. A series of macro-level collision prediction models were developed using 1996 data employing generalized linear regression approach assuming negative binomial error structure. In order to attain the first objective, two evaluation measures were used to assess the effectiveness of models when they were transferred which is as follows: (i) Nested likelihood ratio test (LR): which measures the statistical similarity of the coefficients between the original model and the transferred model, and (ii) Transfer Index (TI): which is a relative measure that indicates how well a transferred model performs in predicting the application dataset relative to a locally estimated model. In order to attain the second objective, two updating procedure was used such as: (iii) Updating using a calibration factor: which was nothing but the IHSDM procedure for model updating, and (iv) Bayesian updating: which combines sample information with prior information in order to achieve more accurate updated information and was described by Atherton and Ben-Akiva (1976). The results of this research concluded that the transfer of macro-level CPMs was undesirable. Regarding model updating, the “updated” models which were developed using IHSDM procedure performed better than the 1 996 models and also the Bayesian updated models. This was because the Bayesian updating assumes that the prior and posterior distributions of the parameters were normally distributed, which was an erroneous assumption. 24 2.7.2 Shortcomings of Previous Work on Transferability of Models The “z” score introduced in the previous section for testing the model transferability involves two terms, and (z’) , values of which are dependent on the shape parameter K of the negative binomial error structure of the new data set. The original data set used for developing models and the new data set to which model has been transferred is two very distinct sets. The shape parameter K for the new data set relative to the transferred model is totally different from the original data set relative to the same model. Hence, values of the shape parameter IC of the transferred model should be recalibrated before using it to calculate the z” score (Sawalha and Sayed, 2005b). The IHSDM model recalibration procedure described by Harwood et al (2000) did not provide any scientific basis behind it and no method for testing model transferability was suggested. The study by Persaud et al (2002) did not employ “z” score and also did not recalibrate the shape parameter of the Vancouver and California model. The same is true for the study carried out by Hadayeghi et al (2005). Hence, these studies were considered inappropriate in describing model transferability as they did not provide any evidence of recalibration procedure that accounted for safety differences among various regions. 2.7.3 Two Alternative Methods for Model Transferability Sawalha and Sayed (2005b) proposed two alternative methods for model transferability which are: (i) To adopt the model as it is without any change of the model parameters but with recalibrating the shape parameter K associated with the transferred models. This method is known as Moment method. (ii) Recalibrate the constant of the model equation before adapting it in order to allow the transferred model to better suit the local conditions along with recalibration of the shape parameter K This method is known as Maximum Likelihood Method. Sawalha and Sayed (2005b) recommended that, regardless of which method has been chosen, recalibration of the shape parameter of a transferred model is an obvious necessity. Since all 25 models that are transferred to a new jurisdiction are based on negative binomial residuals and involve the use of shape parameters as part of it, there is no other alternative transferability test other than to use the “z” score as a test for transferability. Sawaiha and Sayed (2005b) used collision, traffic, and geometric data from 283 arteriaL sections in Vancouver for developing micro-level negative binomial CPMs and from 102 arterial sections in Richmond to investigate the transferability from Vancouver to Richmond. Then transferability tests were done using both of the above mentioned procedure. The moment method for model transferability estimates the value of ! using equation 2.18 described by Kulmala (1995): N (218) k= 1=1 - )2 - Using the above equation Sawalha and Sayed (2005b) found the value of “z” as -1.74 for the Richmond data under Vancouver model. Attempt was also made to recalibrate the model using 1l-ISDM procedure. The procedure involved applying IHSDM factor for recalibrating the model constant and then applying equation 2.17 for calculating z score. This procedure resulted a ‘z” score equals to -1.89 which was further away from zero. Finally, the study used maximum likelihood approach for recalibrating both the shape parameter and constant term using “OFFSET” facility available in the software GLIM4. This resulted “z” score equals to 0.94 which was much closure to zero than calculated by the other two methods. Hence, the maximum likelihood approach was considered superior than the other methods of model transferability. This was because the maximum likelihood parameter estimates are minimum variance estimates which make them more reliable than moment parameter estimates. 2.7.4 Macro-Level CPM Transferability by Maximum Likelihood Procedure While the study by Sawalha and Sayed (2005b) on micro-level CPM transferability yields promising results, the application of this approach on macro-level CPM transferability 26 remains an issue of interest. Lovegrove (2006, 2007) did a transferability study for macro- level CPM which involved testing whether the City of Vancouver’s 112 urban zones could be transferred to Richmond’s 40 urban zones and Langley’s 20 urban zones. As only data for the year 1996 was available, only a geographic transferability was done. The study also intended to find out the influence of sample size on the associated statistical tests of significance related to transferability results. Following the proposed transferability guidelines by Sawaiha and Sayed (2005b). the study resulted in “z” score that were very close to zero. This suggested that all the models were transferred successfully according to statistical tests at 95% level of confidence. Moreover, results of the transferability test from Vancouver to Langley suggested that a smaller sample size may not be a major hindrance to macro-level model transferability. However, the study used datasets that were basically subsets of the original datasets in the same space and time which was suggested as the reason for the results being so good. Hence, more research seems to be needed on transferability of macro-level CPMs to a more distant municipality and/or a different time period. 2.8 Summary In recent years. incorporating safety explicitly into the transportation planning process has emerged as one of the strategies for improved transportation safety. Although the traditional reactive road safety improvement programs can effectively identify and treat hazardous locations, the application of this approach required the existence of a significant collision history. As such, proactive intervention appeared to have greater potential for reducing collision in a sustainable manner. However, there existed knowledge gap in the development of better tools and criteria to do proactive road safety as part of long-range transportation planning process. This knowledge gaps lead to the research on community-based macro-level CPMs for use in region wide planning level analysis. Nevertheless, one issue involving with developing macro-level CPMs is the lack of availability and quality of data, especially in smaller communities. Such a smaller community may not have their central geo-coded database and have to rely on less reliable data. This problem gave rise to the issue of transferability of CPMs across different spatial-temporal regions. The literature review suggested that it would be beneficial if CPMs developed for one region in one period of time 27 could be applied in a different period and in a different geographical region. Although different researchers proposed different methodology for model transferability, the maximum likelihood approach described by Sawalha and Sayed (2005b) and adapted/recommended for macro-models by Lovegrove (2006,2007) appeared to be superior to all other methods. Therefore the next step of this research work is to test macro-level CPM transferability by maximum likelihood approach, which has been discussed in the subsequent chapters of this thesis. 28 CHAPTER 3 METHODOLOGY FOR DATA AGGREGATION & TRANSFERABILITY OF MODELS 3.1 Introduction This chapter is comprised of three main parts. The first part, containing sections 3.2 to 3.7 describes the data extraction and aggregation approach for model development for the City of Kelowna including information on the geographic scope, aggregation approach, variable definitions, and sources of data. The second part includes section 3.8. where the methodology for macro-level CPM development for the city of Kelowna is described, with detailed descriptions on model groupings, the model form used, model development and model goodness-of-fit. Finally, section 3.9 describes the methodology for community-based macro- level CPM transferability using 1996 data, recalibrated for use in Kelowna using 2003 data. 3.2 Geographic Scope of Data Data for this research were obtained from the city of Kelowna, located in the Central Okanagan Regional District (CORD), on the shore of Okanagan Lake, in the Province of British Columbia (BC), Canada, as shown in Figure 3.1 (Kelowna Official Community Plan Map. 2006). Two major highways, highway 97 and highway 33 go through it. Kelowna is nested in the interior of BC, between the Rocky Mountains to the east, the Coastal Mountains to the west, and the Cascade Mountains to the south. With a total land area of 262 square kilometres, and an average elevation of 1129 feet above sea level, Kelowna is a home to 109,000 people living in about 60,000 dwellings (Census Canada 2003). The city has a labour force of 75,000 people, with 91% employment rate, and 9 % unemployment rate (Census Canada 2003). 29 Figure 3.1 has been removed due to copyright restrictions. The information removed is a map of the city of Kelowna in the Central Okanagan Regional District (CORD), in the province of British Columbia (BC), Canada. Original source can be found at www.citv.kelowna.hc.ca Figure 3.2 shows details of the city sectors in Kelowna (Kelowna Official Community Plan Map. 2006). Most of the population live and work in the urban sectors, including the Central City. Highway 97. Rutland, and south Pandosy/KLO sectors. Village sectors are those that contain clusters of small-scale, residential, retail, and office uses, which provide for convenience needs of area residents and are located along Glenmore, North Mission. Southwest Mission, University, Black Mountain, Guisachan, Capri and other villages within the City Centre. Given this geographic scope, the first step in this research was to decide on how to aggregate the data. Figure 3.1 has been removed due to copyright restrictions. The information removed is a map of the city of Kelowna in the Central Okanagan Regional District (CORD), in the province of British Columbia (BC), Canada. Original source can be found at www.cily.kelowna.be.ca Figure 3.1: City of Kelowna Figure 3.2 shows details of the city sectors in Kelowna (Kelowna Official Community Plan Map. 2006). Most of the population live and work in the urban sectors, including the Central City, Highway 97, Rutland, and south Pandosy/KLO sectors. Village sectors are those that contain clusters of small-scale, residential, retail, and office uses, which provide for convenience needs of area residents and are located along Glenmore, North Mission, Southwest Mission, University, Black Mountain, Guisachan. Capri and other villages within the City Centre. Given this geographic scope, the first step in this research was to decide on how to aggregate the data. 30 Figure 3.2 has been removed due to copyright restrictions. The information removed is a map of the details of the city sectors in Kelowna, including the urban and rural sectors. Original source can be found at www.city.kelowna.bc.ca Figure 3.2: City Sectors in Kelowna 3.3 Aggregation Units To be consistent with model development and model transferability methodology recommended in model use guidelines (Lovegrove 2007, Lovegrove & Sayed, 2006, Sawalha & Sayed, 2005b) of the original GVRD models, the aggregation unit was based on 372 traftc analysis zones (TAZ5) used in the CORD’s transportation planning model. which is a classic four-step model built using TMODEL7412software. Detailed descriptions of TMODELTw2 31 software are beyond the scope of this research. However, the model’s basic inputs. mechanics. and outputs are very similar to other popular transportation planning model software (e.g. Emme/2). Here, inputs include, among others: geo-spatially referenced intersection (node) and road (link) locations and attributes, location of TAZ boundaries, modelled (major) road networks, and, zonal population and employment. Outputs include, among others: link volumes, travel times, and congestion levels. Similar to other conventional transportation planning models, the CORD model TAZ sizes and boundaries had been chosen by CORD practitioners to keep zonal population and employment at approximately uniform across the region. Fortunately, the CORD’s zonal boundaries also overlapped closely with Census Tract (CT) boundaries, thus facilitating the process of obtaining current and future demographic and land use data at a much greater level of detail and accuracy in each zone. Figure 3.3 shows the close correlation between the TAff)DELTM2 TAZ boundaries and CT boundaries across the CORD. This kind of overlapping of the TAZs with the CTs also helps to ensure a practical neighbourhood-focused planning tool. 32 Lend Cenwi LCI Zcrrb 9- I El i El - Icr — — ll - — Figure 3.3: Kelowna TMODELTM2 TAZ boundary & Census Tract Boundary 3.4 Variable Definitions Once the geographic scope and aggregation units were identified, it was necessary to minimize possible aggregation bias in the zonal level, which was done in two ways. First, although it is claimed that the CORD’s TAZs were selected to maintain uniform population and employment levels across all the zones, there was significant variability in zonal area (AR), particularly between rural and urban areas. These differences in AR could have led to a bias in the statistical association between zonal size, population and/or employment and collision forecasts (for example, greater number of collisions may result from larger zones where there are more population). Therefore, it was necessary to stratif’ variables by separating them into rural and urban classes. Splitting the variables into rural and urban ..l. 33 classes was undertaken by means of a population density measure (pop/km2). To accomplish this task, the Census Population per Dissemination Area (DA) was divided by DA area to get the population density. Here, DA refers to small census tract areas composed of one or more neighbouring blocks that were canvassed during the 2003 census, with a population of 400 to 700 persons (Census Canada 2003). All of Canada has been divided into dissemination areas. The next step was to sort all DA’s according to population density and then categorize and group DA’s according to population density. Then using ArcGlS 9.0, the DAs were cross-referenced with their geo-spatially corresponding TAZs, which were then divided into urban and rural classes according to the calculated DA population density. Table 3.1 shows the urban and rural TAZs according to population density. In this research, an area with population density under 800 persons per square kilometre was considered as rural and population density over 800 persons per square kilometre was considered as urban originally used in the Okanagan Valley Quest Model (Lovegrove and Stanos. 2006). The Okanagan Partnership’s Regional Planning Flagship (OPRPF) goal was to build the regions first integrated strategic growth plan for air, land, water and transport. Details description of this is beyond the scope of this research. The DAs classification of Kelowna has been shown in Figure 3.4, with the rural zones outlined in red and the urban zones outlined in blue. A second technique employed to minimize possible aggregation bias resulting from zonal size and population variations was to define variables using density measures, wherever possible, rather than absolute measures (e.g. zoiial population density versus absolute zonal population). Table 3.1: Urban and Rural Classes According to Population Density Population Density (poplkm2) Area Classes Urban/Rural 5 — 50 Rural 50 — 800 Rural Residential Rural 800—4000 Suburban 4000 — 8000 Semi Urban Urban > 8,000 Urban 34 3.5 Selection of Candidate Variables In order to carry out the macro-level CPM transferability test from GVRD to Kelowna, the model form and variables used in this research were based on research described in Lovegrove and Sayed (2006) GVRD on macro-level CPM development. In this research, attempt has been made to use several of those recently developed macro-level CPMs in a model transferability test. Therefore, the same candidate variables used for developing 35 Figure 3.4: Rural and Urban Zones in Kelowna GVRD macro level CPMs were selected to find out whether the models could be transferred to Kelowna. 3.6 Stratification of Variables Data were stratified for both the dependent and independent variable sets. in accordance with previous research and model use guidelines (Lovegrove et al 2006, 2007). The dependent variable consisted of forecasted zonal collision frequency, which was grouped into the following categories: o Collision type: Pedestrian-vehicle, motorcycle-vehicle, bicycle-vehicle, and vehicle- vehicle collisions. o Collision severity: Fatal, Injury, and Property-Damage-Only (PDO) collisions. o Collision time period: All collisions that occurred in a three year time period (2001 to 2003) were considered. broken down into the following time periods: o AM rush hour collisions (6:00 — 9:00 AM); o PM rush hour collisions (3:00 — 6:00 PM); o Non-rush hour collisions; and, o Total (24 hour) collisions. The independent or explanatory variables were stratified in three levels: o The first level included four themes of variables namely; Exposure, Socio Demographic (S-D), Transportation demand management (TDM) and Network Variables. o The second level of stratification used two types of exposure variables, either measured or modeled. Measured data consisted of information that was obtained either digitally by using ArcGIS 9.0 software or manually from land use, census, and road maps or databases. Modeled data consisted of traffic volume, speed, and volume to capacity ratio (VC) (i.e. congestion), which were the output from the CORD’s transportation planning model TMODEL12. o The third level of stratification was based on urban and rural land use types, which was discussed earlier. 36 Table 3.2 summarizes the candidate variables with data sources, year, abbreviation, units, and exposure variable extraction method (i.e. either measured/modelled). along with descriptive statistics. Table 3.2: Candidate Variables — Collisions, Exposure, S-D, TDM, Network 1. Collisions Symbol Source Method Year Total Zn Avg Total collisions over 3 years T3 ICBC Measured 2001-2003 16820 45 Severe collisions (fatal & S3 ICBC Measured 2001-2003 3493 9 injury) PDO collisions PDO ICBC Measured 2001-2003 8683 23 Total rush hourcollisions R3 ICBC Measured 2001-2003 5858 16 Non rush hour collisions NR3 ICBC Measured 2001-2003 10962 29 Severe rush hour collisions RS3 ICBC Measured 2001-2003 1359 4 Pedestrian-vehicle collisions P3 ICBC Measured 200 1-2003 124 2 Bicycle-vehicle collisions B3 ICBC Measured 2001-2003 128 2 Motorcycle-vehicle collisions M3 ICBC Measured 2001-2003 151 2 Vehicle-vehicle collisions V3 ICBC Measured 2001-2003 16417 44 2. Exposure Symbol Source Method Year Total Zn Avg Zonal area (Hectares) AR TtIODEL’12 Measured 2003 22124 59 Total lane km(from ARCGIS) TLKM ICBC Measured 2003 1733 4.66 Total vehicle km travelled VKT TMODELJM2 Modelled 2003 201944 543 Average zonal speed (km/h) SPD TMODELIM2 Modelled 2003 N/A 40 Average zonal congestion VC T4IODELJM2 Modelled 2003 N/A 0.3 15 level 3. Socio Demographic Symbol Source Method Year Total Zn avg Urban zones URB Okanagan Valley Measured 2006 145 N/A Quest_Model Rural zones R1JR Okanagan Valley Measured 2006 227 N/A Quest_Model Total population POP Census Measured 2003 95459 257 Population density(POP/AR) POPD Census Measured 2003 N/A 12 Participation in labour force % PARTP Census Measured 2003 N/A 58% (emp-unemp/PoI 5) Zonal job in finance, WKG TMODEL”2 Modelled 2003 48479 130 institutional, retail. manufacturing, construction, utilities and home based business Zonal job per WKGP Ti’vIODEL’612 Modelled 2003 N/A 1.86 capita(WKG/POP) Zonal job density (WKG/AR) WKGD TAiODEL”’2 Modelled 2003 N/A 13.57 Average Income $ INCA Census Measured 2003 N/A 22749 Average zonal family size ES Census Measured 2003 N/A 2.16 Home Density (NH/AR) NHD Census measured 2003 N/A 4.53 37 4. TDM Symbol Source Method Year Total Zn Avg Total commuters from each TCM Census Measured 2003 32228 1 73 zone Commuter density (TCM/AR) TCD Census Measured 2003 N/A 4 Core area(area w/o major CORE Census Measured 2003 N/A 93% roads)% No of drivers (%) DRV Census Measured 2003 N/A 82% No of commuters as passenger PASS Census Measured 2003 N/A 6%(%) No of commuters by biking BIKE Census Measured 2003 N/A 3%(%) No of commuters by walking WALK Census Measured 2003 N/A 6%(%) No of commuters by MOTOR Census Measured 2003 N/A 0% motorcycle_(%) No of commuters by Taxi cab TAXI Census Measured 2003 N/A 0%(%) No of commuters by transit BUS Census Measured 2003 N/A 2%(%) Shortcut Capacity on local SCC ICBC Measured 2003 6447 17.33 roads_(vph) Shortcut Attractiveness SCVC ICBC Modelled 2003 2219 5.96 (SCCxVC) 5. Network Symbol Source Method Year Total I Zn Av No. of signals SIG City of Kelowna Measured 2003 99 0.27 Transportation division Signal density(SIG/AR) SIGD City of Kelowna Measured 2003 N/A 0.025 Transportation division No. of intersections INT ICBC DRA Measured 2003 1925 5.2 Intersection density (INT/AR) INTD ICBC DRA Measured 2003 N/A 0.21 No. of 3 way I3WP ICBC DRA Measured 2003 N/A 62.7% intersection/INT(%) No. of arterial local IALP ICBC DRA Measured 2003 N/A 15.3% intersection/INT_(%) No. of collector local ICLP ICBC DRA Measured 2003 N/A I 1.6 intersection/INT_(%) No. of arterial lane km ALKM ICBC DRA Measured 2003 378 I .02 No. of collector lane km CLKM ICBC DRA Measured 2003 165 0.44 No. of local lane km LLKM ICBC DRA Measured 2003 I 190 3.2 No. of arterial lane km/TLKM ALKP ICBC DRA Measured 2003 N/A 22%(%) No. of collector lane CLKP ICBC DRA Measured 2003 N/A 9.5% km/TLKM(%) No. of local lane km/TLKM LLKP ICBC DRA Measured 2003 N/A 68.5%(%) .3 3.7 Aggregation of Each Candidate Variable With each of the datasets screened and stratified, the next step was to capture the spatially disaggregated data by traffic zone aggregations. GIS was the key tool used to spatially capture the data by traffic zones. In order to accomplish this task, ArcGIS 9.0 software package was used to capture the traffic related data and also to capture demographic and intersection related data. The following subsection provides a brief overview of this data aggregation process for each of the variables in each model groups (i.e. exposure, S-D, TDM, and Network) which were listed in Table 3.2. 3.7.1 Exposure Variables Exposure variables consisted of both measured and modeled data. The Kelowna transportation model TMODELTM2 provided modelled data in its traffic assignment output files for exposure variables, including: link number, node to node connectivity, link length. assigned link traffic volume in each direction, travel time, speed, and congestion levels (i.e. link volume/capacity ratio). For aggregation of exposure data, the most difficult challenge was unavailability of a geo referenced road network of the modelled roads which were basically output from TMODEL72. Available software resources using TMODELTM2 output files only provided sufficient formats to produce a spreadsheet in MS Excel containing link numbers, node numbers, node to node connectivity, and link attributes (i.e. volume, speed, capacity, number of lanes). Fortunately, an AutoCAD drawing of the modelled road network was provided by the City of Kelowna, complete with node numbers. On examination, this CAD rendered version had little resemblance to the actual digital road network, as shown in Figure 3.5. where the orange and red lines show the links in the modelled road network and real road network, respectively. These disparities were not surprising given that the modelled road network was constructed using manual trace techniques. 39 (a) Modeled road network in CAD Figure 3.5: Road Network In order to overlay and geo-reference the CAD modelled road network with the zonal map and also with the digital road map, several known ‘landmarks’ were used, including the Okanagan lake floating bridge, which was distinguishable in both of the maps. Using this as the reference point, the CAD modelled road network was transformed into a GIS shape file and aligned with the zonal map with reasonable accuracy by a method known as “Two Point Transformation” in ArcGIS. The resulting GIS shape file, complete with node numbers of each link, was then used to identifi which links fall within a particular zones. Using the spreadsheet of assigned link attributes, the corresponding values of the link length, volume, speed and capacity could be obtained. Thus the VKT for a particular zone was calculated as the sum of the products of link traffic volume and link length, using only that portion of any link beginning, ending, or passing through a zone, as described by equation 3.1. (b) Real road network in ArcGIS 40 VKI =Length *volumek (3.1) Where. i=1,2,3 372 ,zone numbers n number of links that begin, end, or pass through a zone Lengthk = the portion of link k falling within zone i Volumek = the traffic volume assigned by TMODELTM2 to link k Similarly, the average zonal congestion (VC) was calculated as the average of all modelled link volume to capacity ratio (V/C) values in a particular zone, and can be expressed by equation 3.2. VC,=v (3.2) Where, i=l, 2,3 372 ,zone iiumbers n = number of links that begin, end, or pass through a zone V/Ck = volume to capacity ratio of link k falling within zone i Unfortunately, for this research, no modelled exposure output data was available on Transit Kilometres Travelled. However, given the fact that the GVRD’s zonal Transit Kilornetres Travelled (TKT) was typically far less than zonal VKT (about 1 0%), this was not considered in this research. Moreover, transit share in Kelowna, at 2% zonal average, was lower than the 9% observed in the GVRD. All other data was measured, and required no transportation modelling software to extract. 3.7.2 Socio-Demographic (S-B) Variables The Socio-Dernographic data was obtained from the Census Canada 2003 database. The census data were provided in a variety of spatial formats and were required to be converted to traffic zone aggregations. Although geo-referenced census tract (CT) and TAZ boundaries overlapped closely where common boundaries existed, there were significant differences in 41 size between the generally larger CTs and smaller TAZs, which had the potential to introduce error into the data aggregation and/or modelling results. Large CTs often contained several TAZs, which complicated the efforts to assign CT demographic data to individual TAZs contained within it. Therefore, to minimize possible errors due to land use and population variations, Dissemination Areas (DAs) were used instead, which divided the Census Tracts into much finer segments. In most cases, this resulted in only one TAZ per DA, and a much simpler translation of Socio-Demographic data to TAZs. Finally, a combination of land use mapping, and zonal population data was used to pro-rate a reasonable distribution of the data from DAs to each TAZ. Figure 3.6 shows the regional distribution of population density (POPD), with the highest densities shaded with darker colors and focused around the urban city centres. The lowest densities are shaded with light yellow colors and are located in the village centres and agricultural lands. Figure 3.6: Population Density in the City of Kelowiia. 42 3.7.3 Network Variables Most of the network variables were aggregated digitally; with a few variables being aggregated manually and; all variables consisted of measured data. The Insurance Corporation of British Columbia, ICBC, provided a Digital Road Atlas (DRA) map with detailed road classifications. Using ArcGIS 9.0 software, semi-aLitomated extractions were performed to identify and aggregate data points in each TAZ on the number of intersections (INT), intersection density (INTD), number of 3 way intersection (I3WP), percentage of arterial- local intersection (IALP) and percentage of collector-local intersection (ICLP). The Signal Density (SIGD) and Number of Signals (SIG) in each zone were determined in the same way with the help of 2003 signal plan map provided by the city of Kelowna Transportation Division. While most data points such as the number of intersections and intersection types can be extracted directly from the DRA, the most complex network data to extract for each zone was the Total Lane Kilometres (TLKM), and associated Arterial Lane Kilometres (ALKM). Collector Lane Kilometre (CLKM), and Local Lane Kilometres (LLKM). Fortunately. GIS software (ArcGIS 9.0) greatly facilitated this task in a few steps. For example, to extract TLKM data, the following steps were needed which has been described below: 1. First, the zonal map was overlaid with the DRA map in ArcGIS, with checks made to ensure reasonable geo-spatial accuracy between the two layers. The “intersect” toolbar in ArcGIS facilitates to split or “cookie-cut” the digital road network by each traffic zone. Figure 3.7 illustrate the process of splitting the road network by traffic zone. 43 Figure 3.7: Illustration of process of splitting the road network by traffic zone system. 2. The next step was to calculate the length of each split road link that fall in each zone. The “Field Calculator” toolbar in ArcGIS made it possible to do this task by using the following Visual Basic command: Dim Output as double Dim pCurve as ICurve Set pCurve [shape] Output pCurve.Length 3. While it was possible to calculate the length of each road link that fall within a particular TAZ, it was then necessary to associate these calculated link lengths by zone number. The Data Management tool in ArcGIS made it possible using the “Dissolve” toolbar and “Summation” statistics by which the sum of all calculated link (a)Unsplit road (b) Split road network captured by traffic zone 44 lengths was dissolved according to each zone number. A snapshot of this dissolving process has been illustrated in Figure 3.8. iiiiiir _ nO Stsae Zone_numb SUM_Length I 0Pye 1 2124 1 :PyIine 2 3878 — 2PolyIine 3 5292 — 3fyIine 4 1802 4otyline 5 210 S Polyline 6 2668 8Potyre 7 2274 7PoJylbe 8 5294 8 Polyline 9 1068 9Poiyline 10: 4474 10Pdythe 11 420 - - lHPoIyltne 12 - - 5250 12 Polyline 13 2300 1oIyline 14 1 896 iiPotyhne 15 7578 i5PolyIine 16 7960 16Polyline 17 7074 l7tolyne 19 46104 lOPolyire 20 896 l9Potyltne 21 4368 20 .Potyline 22 21 PolyIii 23: 4586 Record: jjJ 1 _jj Show: [i• Selected Records 0 out 01 Figure 3.8: Snapshot of a GIS file for calculating Total Road Kilometre Therefore, it was possible to get the sum of all link lengths that fall inside each zone and thus the total road kilometre was worked out for each zone. Once the total road kilometres were determined for each zone, the next step was to calculate the Total lane Kilometre (TLKM) by road class for each TAZ. 4. The final step to determine zonal TLKM required knowledge of the number of lanes in each direction for each road in the zone. To ascertain lanes on each road, Google Earth was used. It is a public, web-based, real time map that provides visual details of road networks in North America, including road classifications, and number of lanes in each direction. This web-based resource was used to assign laning to each road link. Hence, for a particular zone the total road kilometre was determined using ArcGIS for each road class and using Google Earth, the number of lane on that road 45 class was determined by visual observation. Thus, TLKM in each zone was simply the total road kilometre by road class multiplied by the number of lane of that road class. 3.7.4 Transportation Demand Management (TDM) Variables Data for Transportation Demand Management (TDM) variables came from Census Canada 2003 database and ArcGIS land use and transportation mapping. Unfortunately, commuter data were not available at DA level, therefore all the TDM data were transferred to TAZ by developing only a CT to TAZ translation table. Census data were gathered for the number of commuters (TCM) broken down by modes of travel, including: drivers, passengers, bicycles, motorcycles, taxi cab users, pedestrians, and public transit. The TDM variable Neighbourhood Core Area (CORE) was extracted for each of the 372 zones using ArcGIS and visual examination, where CORE was defined by van Minnen (1999) as the largest portion of the traffic zone area not bisected by major roads. Here, to be consistent with previous variable definitions, CORE values were expressed as a relative percentage of total zone area. The other two variables Shortcut Capacity (SCC) and Shortcut Attractiveness (SCVC) as defined by Lovegrove (2006, 2007) were used to provide further TDM traits related to the neighbourhood’s access structure and road network. The mathematical definitions of these two variables are presented in Equations 3.4 and 3.5 below. LW•C .(R +R. ).c scc= N I (3.3)A SCVC=SCCVC (3.4) Here: SCC = Shortcutting capacity SCVC = Shortcutting attractiveness L Average number of local road lanes in each direction, W I for one-way, 2 for two-way 46 C1 Typical local road capacity (assumed as 1 50 veh/lane/hr), RN,R/;V Number of (north-south, east-west) local roads running completely across the zone Z(R +R;H) CJ(. Degree of zonal traffic calming (0 if traffic calmed; I if no calming; 0.5 if some traffic calming) Ar — Zonal area and VC Average zonal congestion level In the above equation, data values were assigned to each variable by a visual examination of neighbourhood street network, land use and zone boundary maps with the help of a real time Google Earth Map and ArcGIS for TAZ boundaries. 3.7.5 Collision Variables Collision data were obtained from the Insurance Corporation of British Columbia (IC BC) for the period of 2001 to 2003. ICBC uses a break down of incident severity as shown in Table 3, which introduced several collision causes that were not considered relevant to this research, including: “uncategorized”. ‘service” and “crime”. Material Damage < $1000 and Material Damage> $1 000 were considered as PDO collisions. Table 3.3: Incident Severity as Defined by ICBC CODE DESCRIPTION M Material Damage <= $1000 N Material Damage> $1000 I Injury F Fatality L Other C Crime S Service U Uncategorized 47 Figure 3.8 and 3.9 shows the spatial distribution for collision density (collisionlhectare) both for total and severe collisions respectively. There was much resemblance in patterns between the total and severe collision densities which confirms earlier results by Hadayeghi et al. (2003). Also the collision density patterns across the city are very similar in many respects to the population density patterns which was shown earlier in Figure 3.6. Figure 3.9: Collision Density for Total Collision. 48 3.8 Development of Macro-Level CPMs With data for the candidate variables extracted and aggregated, model development and transferability of these models began on a group by group basis. As previously noted, the variables and forms used in this transferability research were based on GVRD research. In this thesis, an attempt has been made to develop Kelowna’s own macro-level CPMs as well 49 Figure 3.10: Collision Density for Severe Collision. as recalibrate the GVRD models built using 1996 data to Kelowna using 2003 data, and thereby compare the CPMs developed by the two methods. The following sections summarize the methodology for model development and model transferability that followed the research by Sawalha & Sayed (2005b) on micro-level CPM development and transferability and by Lovegrove (2006, 2007) on macro-level CPM development and use. 3.8.1 Background In a previous transferability study involving GVRD municipalities and micro-level CPMs, Sawalha & Sayed (2005b) evaluated the transferability of micro-level CPMs developed using Vancouver intersection data, being calibrated for use in Richmond, with promising results. Based on the procedure developed by Sawaiha & Sayed (2005b), a transferability study was done in this thesis. In doing so. a GVRD dataset was available, consisting of the 1996 dataset used in the original macro-level CPM development. Hence, data was extracted for Kelowna for the years 2001 to 2003, which provided an opportunity to test transferability of CPM in both geographically and chronologically different region and time. 3.8.2 Model Groupings Model development proceeded according to the sixteen groups listed in Table 3.4, using the following explanatory data levels: • Four themes of explanatory variables (Exposure, S-D, TDM and Network); • Two types of land use (rural and urban); • Two types of exposure data derivations (modelled or measured). 50 Table 3.4: Model Groups Themes Land Use Derivation Group # ModelledUrban Exposure Measured 2 Modelled 3Rural Measured 4 Modelled 5Urban Socio-Demographic Measured 6 (SD) Modelled 7Rural Measured 8 Modelled 9Urban Transportation Measured 10 Demand Modelled I IRural Managernent(TDM) Measured 12 Modelled 13Urban Network Measured 14 Modelled 15Rural Measured 16 This model development involved comparing the developed models to the transferred models, as was discussed earlier. Model transferability involved gauging whether models developed for the GVRD’s 479 urban zones and 93 rural zones could be transferred and calibrated for use in Kelowna’s 145 urban zones and 227 rural zones. Given the time and scope of this research, models were developed and transferred considering only the total collision per three year (T3), on the premise that, once total collision models could be transferred successfully, the rest of the models were most likely transferable as well. Therefore, to demonstrate that macro-level CPM transferability was feasible, the same measured and modeled total collision macro-level CPMs developed by Lovegrove (2007) and Lovegrove and Sayed (2006) from all four major variable themes were selected. which will be discussed in the next sections. 51 3.8.3 Model Form To allow for both measured and modeled groupings, the model form used in the research is as follows: E(A)a Zu e’ (3.5) where: E(A) = Predicted mean collision frequency a , a, b = parameter estimates derived from GLIMoil Z = Exposure variable (VKT for modeled, TLKM for measured) and x. = Independent explanatory variables (e.g. CORE, SIGD, POPD, VC etc.). The log-linear transformation was carried out using a logarithmic link function in the GLM4 software (Numerical Algorithms Group, NAG 1996), transforming Equation 3.5 into: Ln[E(A)] = Ln(a0)+ a1 Ln(Z) + . x,) (3.6) Once the total collision model forms were selected, the regression process followed. 3.8.4 Model Development The development of CPM for Kelowna using 2003 data was carried out using the GLIM4 statistical software package. The Generalized Linear Regression method (GLIM) used for this research was the same, and followed that of micro-level CPM development closely, It was stated earlier that, in order to facilitate model transferability study, models were developed for Kelowna using the same form of measured and modeled total collision CPMs for GVRD. Hence, instead of following a forward stepwise procedure for selection of explanatory variables, the same variables of the GVRD models were retained for Kelowna models in each of the model groups. This was done using the “FIT” command in GLIM4 to define the linear model and also to cause the numerical estimation procedure to be carried out. A sample of a GLIM4 output file is given in Appendix A. Using the selected model 52 form, model development and transferring of the models was then conducted to optimize goodness of fit. 3.8.5 Assessing Model Goodness of Fit To assess goodness of fit for the developed model and the transferred models, two statistical measures were used. The first measure is the Pearson 2 statistic given by the following equation (McCullagh & Nelder, 1989): Pearson 2 [Yi —E(A1)j2 [y, —E(A,)]2 (37) (=1 Var(y) E(A,)[1+E(A,)/K Where, yi is the observed accident count and K is the negative binomial shape parameter. The second statistical measure to assess the goodness of fit of the model is the scaled deviance and can be described by the following equation (McCullagh & Nelder. 1 989): SD2 yln —(y.+K)1nI (3.8) E(A1)) 2Both the Pearson % and the scaled deviance are asymptotically % distributed with n-p degrees of freedom. For these, 95% was the desired level of confidence used to assess goodness of fit. 3.8.6 Performing Outlier Analysis Given the fact that the same model forms as described by Lovegrove and Sayed (2006) should be used to carry out a transferability study from GVRD to Kelowna, an outlier analysis was done for developing Kelowna’s own collision prediction models. An outlier 53 analysis as described by Sawaiha and Sayed ( 2005) was used to identify and delete those data points (i.e. TAZs) with the highest Cook’s Distance (CD) value (see chapter 2 for details), which indicated points that were very likely to be outliers. The analysis was done in a stepwise progression by removing one point at a time with the largest CD values first. As each high CD data point was removed, the GLIM software was then re-run fixing the value of the dispersion parameter, Theta (K) at a fixed value which has been obtained from the previous model to test if the point was an outlier. This resulted in a new Scale Deviance (SD) statistic value being calculated based upon the revised database (i.e. with outlier being removed). If the new SD statistic value dropped significantly when compared to its previous value, the dropped data point was considered as an outlier and removed. In this study, at the 95% confidence level, a drop in SD > 3.84 per data point deleted was considered to be significant. This outlier search procedure was repeated until the drop in SD < 3.84, indicating insignificance. Once outliers were removed from the dataset, GUM software was re-run setting K=0 to determine new estimates for each parameter and Ic, and descriptive statistics by GLIM itself. If the improvement in fit was enough to bring all quantitative assessment statistical values in line with the targeted values such as SD and Pearson 2 <Target 2 , the developed models as well as the transferred models were considered to have a good fit. 3.9 Macro-Level CPM Transferability Generally, the Pearson statistic, as given in Equation 3.7, is a measure of the goodness of fit of any CPM containing any data set. Therefore, this statistic can be used to test whether a certain CPM that has been developed using one data set, can generate a reliable safety prediction for a totally different data set. Hence, if a CPM that is applied to a new dataset is valid and if the observations in the new data set are independent, then the expected value and the standard deviation of the Pearson %2 statistic can be given by the following equations (Vogt and Bared, 1 998): 54 = = 2N(1 + 3/K) + E(A,)[1 +E(A)/K] (3.9 a,b) Where, N= the number of data points used to re-calibrate the model. E(A,), k’S are from each re-calibrated CPM Thus, the model transferability calibration step involved statistical verification of each calibrated model’s goodness of fit, using a z test statistic at a 95% confidence interval. calculated using Equation 3.11 (Sawalha & Sayed, 2005b): (3.10) The z values that are near zero or less than one, confirms that a model is transferable (Sawalha & Sayed. 2005b). Following these steps for transferring the models, a maximum likelihood approach was used, where both the shape parameter and the constant term of the transferred model were recalibrated. The “OFFSET” command in GLIM4 software enables developing models with forcing some of the parameters to certain values desired by the model developer. However, the remaining parameters were determined by method of Maximum Likelihood. Thus, using the “OFFSET” facility and the Kelowna dataset, the GVRD models were recalibrated in an attempt to see whether recalibrated models could be transferred to Kelowna. Therefore, all the transferred models contain the same explanatory variables as those of GVRD models and the coefficients of those variables were forced to be the same as the GVRD models. Only the lead constant term and the shape parameter had new values which were output from GLIM4. It is worthwhile mentioning that, the Kelowna dataset that was used for model transferability study was those that were derived after doing outlier analysis while developing models from scratch. A sample of a GLIM4 output file used for recalibrating models is given in Appendix B. 55 3.10 Summary The main objective of this thesis was to develop macro-level CPMs for Kelowna and to test the transferability of GVRD macro-level CPMs (developed using 1996 data) to Kelowna (developed using 2003 data). This chapter summarizes the methodology used in this research, including data extraction and aggregation, and model development and model transferability. The transferability research required that consistent data definitions be followed to calibrate GVRD models for Kelowna. Hence, attention was given in assessing the geographic scope, variable definitions, aggregation units, aggregation approach, and stratification of data in Kelowna to be consistent enough with those of GVRD data. Using these data and the GVRD total collision macro-level CPM equations, model development and transferability was then conducted to calibrate, remove outliers, and optimize goodness of fit. The general methodology and goodness of fit measures followed those described in Sawaiha & Sayed (2005a, b), Lovegrove & Sayed (2006) and Lovegrove (2006, 2007) at a 95% level of confidence. Having described the data and methodology for model development as well as model transferability, the results of these models will be presented in Chapter Four. 56 CHAPTER 4 MODEL DEVELOPMENT AND TRANSFERABILITY RESULTS 4.1 Introduction Following the methodology for data aggregation, model development and model transferability described in Chapter three, the resulting models are presented in this chapter in three sections. In section 4.1, the developed macro-level CPMs for Kelowna are presented, categorized according to the four major model groups (i.e. exposure, S-D, network and TDM). In section 4.2, the transferred/recalibrated models from GVRD to Kelowna are presented. Finally, section 4.3 provides a brief description of the benefits of transferred models over developed models. It should be mentioned that throughout this research, the process of development of models for Kelowna and transferring the GVRD models to Kelowna was carried out considering only one collision type i.e. Total Collision per Three Year (T3). 4.2 Development of Macro-Level CPMs: GVRD vs. Kelowna Collision data and data for the independent variables were collected from the City of Kelowna for the year 2001 to 2003 to develop negative binomial macro-level Collision Prediction Models (CPMs). Using these data, the resulting models are presented in the following sections; classified into urban and rural classes respectively. 4.2.1 Urban Models The development of CPMs for the urban areas of Kelowna was carried out using the GLIM4 statistical software package. The collision, exposure, S-D, network and TDM data for the 145 urban zones in Kelowna were used to develop negative binomial CPMs. The resulting urban 57 models are listed in Table 4.1 with descriptive statistics along with the original GVRD CPMs for comparison. Table 4.1: Urban Models: GVRD vs. Kelowna Model Pearson 2 2 SD 2’ t-StatisticsGroup # Model Form IC dof 1 Urban, Modeled, Exposure Existing GVRD Mode!: Total Collisions/3yr = 1.1 5VKT°68e’45 1.7 495 508 510 t=,?staflf =0.4 1VKJ =13 Aelowna 1/Jode1 5 ( using GVRD model form but 2003 Kelowna data): 1.83 129 153 163 =1.96 —LI8Total Collisions/3yr 1 .97VKT°636 e tVKJ 9.04 -3.72 2 Urban, Measured, Exposure Existing GVRD Mode!: Total Collisions/3yr = 92.48TLKIVI°432’ 1.20 470 530 518 = 18 t1JKM 7 Ketowna Model: ( using GVRD model form but 2003 Kelowna data): ‘constant =2.05 127 137 152 Total Collisions /3yr = 29.56TLKi’vI°406 30.7 iiLKAI 5.23 58 Model Pearson 2 Group # Model Form K 2 SD 2’ t-Statistics %‘ 005 c/of 5 Urban, Modeled, SD Existing G VRD Model: To/al Collisions / 3 yrs = 2 0 461 500 505 tconstanl - 1. I I 822‘7JZT 0 8 88 (0853 vc + 0.00401 wkgd + 0.004924 popd — 0.5359 /c). v ‘VKT-16.’l’C-3 IWKGD3 tpQpj) -4. l5 Kelowna Model ( using GVRD model form but 2003 Kelowna data): Total Collisions / 3 yrs = 2.46 1 14 143 1 53 1COflSIUflI -2. I 2.O99VKT 0 692 e(_1467v 0 0762 wkgd +0.00213 popd —0.1683 f.c) ll’KT--IO 7. tX’ 5J t’(;fl- 1.26 t”O),/)Q6/ tFS23 6 Urban, Measured, SD Existing G [‘RD Model: . . . tcon.ciant=9Total C oih,sions / 3 yrs = 1.60 508 518 514 1TIKM 12 74 .2175 TLKIV[ 0.8218 e°°°7462 pcpcl + 0.06295 unemp — 0.743 Ic) tI’OPD6 ItINEMP- 7 ff..5 Kelowna Model: ( using GVRD model form but 2003 Kelowna data): Total Collisions / 3 yrs = 2.0 157 146 157 cOflMaflt 21 52.537LK,i 0.164 .e(0.00316 popd + 0.0532 unemp 0.5743 Is) tii,KIi 8.6 tp0j’.-1)96 /(JNE,tt’ 3.94 tp5-62y 59 Model Pearson 2 Group # Model Form 2 SD 2’ t-Statistics2’ 0.05, dof 9 Urban, Modeled, Network Existing GVRD Model: Total Collisions /3 yrs = 0.90626VKT°7851 (2.399sigd +0.7947 mtd — 0.0221 3i3wp) 1conIant0.2 e 2.4 485 505 515 1 iKT:J9, ‘,SJGJ)4 i/U iTh6 ti3H]7 Kelowna Mode! tonslant .2.37 ( using GVRD model form but 2003 Kelowna data): 1.84 160 156 164 tT 8.4 IS/GD 2.67 Total Collisions / 3 yrs 2.29VKT°4487 . 1N71) 3.7 (1 .53sigd+l .035 intd—0.0004205i3wp) Ii3wp -0.22 e 10 Urban, Measured, Network E.risting G1’RD Model: 1.9 511 511 514 ‘COfl,t,n! 11Total Collisions / 3 yrs = 29.9342TLKM°857 ITLKM 14 (4.748 sigd 0.0204 i3 wp + 0.007193 alkp) tS/G1.-6. lalkp- 3 e li3WJ’ --12 Kelowna Model: ( using GVRD model form but 2003 Kelowna data): 2.67 129 137 151 /7Total Collisions / 3 yrs = 17 .26TLKM 0.598 I1LK.W SO (5.305igd —0.0000058513wp +0.005573a1kp) tS/GD6.6 ? (a/k1, 2.3 i,3WP -0.03 60 Model Pearson 2 2’ 005. Group # Model Form K 2 SD 2’ t-Statistics dof 13 Urban, Modeled, TDM Existing G [‘RD Mode!: Total Collisions / 3yrs 1 .63052 VKT 06887 1.9 483 510 513 tconsu,n, 1.4 (0.O7924scvc— 0.00002O7core+ 0.00000091 2drive) ll/KT 15 ‘ tscvc-9, tcore --5 td,-jie -2 Kelowna Mode! ( using GVRD model form but 2003 Kelowna data): Total Collisions /3yrs=2.29VKT°5877 . 1.67 132 154 161 lcons,an,-2.4 I(KT’8.6(—0OO328scvc—0.00006 I 5core+ 0.000007534drive) e t.cvc -0.87. ‘cole - - 0.96. ldne 1.1 14 Urban, Measured, TDM Existing G [‘RD Model: tconstant /4 Total Coliisions/3yrs=43.7285TLKi’vf°76 1.5 484 517 tscc=7, 1co,-6 (0.02702scc—0.0000277co,e+O.000123tc,n) ‘(cm 3 e Kelowna Mode!: ( using GVRD model form but 2003 Kelowna data): icons (ant-- 22.5Total Collisions / 3 yrs 27 .6TLKM 22 1.22 152 140 152 1(1k,l,-996Is(.c ‘‘ (0.001804 scc—0.0004488 core—0.000694 tern) tcore-5 75.ltcm---e 1.22 61 Most of the statistical associations that were found in Table 4.1 while developing macro-level CPMs for the city of Kelowna supported the earlier findings for the GVRD models, which are as follows: o Exposure models: Increased collisions were associated with increases in the vehicle kilometres travelled (VKT) and the total road lane kilometres (TLKM). In other words, the more the travel and road-kilometres, the higher the probability of collisions, which is quite intuitive. o S-B models: Increased collisions were associated with increases in job density (WKGD), population density (POPD). and unemployment (UNEMP) levels. This means, the higher the job density and population density, the higher the probability of collisions. However, the association of increasing unemployment with increases in collisions is difficult to explain. Several recent studies (La Scala et al., 2002; Kmet et al., 2003) have also failed to find a statistical association between road safety and unemployment levels. Nevertheless, Lovegrove (2006, 2007) discussed this positive association as school trips made by stay at home parents driving their children to/from school and/or school trips made by secondary/post-secondary students occurring in peak periods. The other variable family size (FS) had an inverse association with collision frequency, meaning decreased collisions were associated with increased family size (FS). This association supported the findings of a study conducted by Ladron de Guevara et al (2004), who suggested that parents are more responsible drivers than children. Lovegrove (2006) discussed this relationship as zones with higher average family size have higher number of children and lower number of adults, meaning fewer responsible commuters, thereby less collisions. All of these associations in the SD model groups supported earlier findings of the GVRD mode Is. o Network models: Increased number of signal density (SIGD), intersection density (INTD) and total arterial road lane kilometres (ALKP) were intuitively associated with increased number of collisions. Hence, it appeared that more signals don’t mean that roads will be safer. On the other hand, the higher the intersection density and/or total 62 arterial road lane kilometers, the higher the probability of collisions. Although these three types of variables were directly associated with total collisions, the other network variable, proportion of 3-way intersections (I3WP) was inversely associated with collision frequency. Hence, for the urban areas it appeared that a useful strategy to enhance safety is to minimize the number of 4 leg signals through the use of 3-way staggered intersections that reduces conflicts. o TDM models: For the TDM models, shortcut capacity (SCC) and number of drivers (DRIVE) had a positive relationship with collisions. In other words, increases in these two variables were associated with increased chances of collisions. However, the other TDM variable, core neighbourhood size (CORE) had an inverse association with collisions which confirmed earlier findings by Lovegrove (2006, 2007) for the GVRD models. Hence, urban neighborhoods that is surrounded by high density traffic areas and protecting against shortcutting traffic with large core, seems to have less chances of collisions. While many statistical associations mirrored those observed in GVRD models, the observed associations of some variables were found to be inconsistent with GVRD model results. which are as follows: o In the Modeled S-D model, the inverse association of average zonal congestion level (VC) with total collisions appeared to be counter intuitive i.e. the less the congestion level, the higher the probability of collisions. This counterintuitive association may be related to the fact that, this VC data came from Kelowna’s less advanced T/1/1ODELTM modeling results, which was not a measured data and of less accuracy than that of GVRD values. o In the Modeled TDM model, increased collisions were associated with decreases in the shortcut attractiveness (SCVC) which contradicts previous GVRD model result. The t-statistics of SCVC variable appeared to be insignificant at 95 % level of confidence. Also, SCVC was calculated as the product of shortcut capacity (SCC) and zonal congestion (VC). In calculating SCC variable, it was assumed that in Kelowna, 63 there were no traffic calming zones at all (in other words, the value of degree of zonal traffic calming C1. = 1). Hence, because of this subjective nature of SCC calculation and poor output of VC from TMODEL’. SCVC appeared to be counter intuitively associated with collision frequency. o In the Measured TDM model, the association of total commuters (TCM) with collision frequency was inversed, contradicting previous GVRD result. The t statistics of this variable was found to be insignificant at 95% level of confidence. TCM data was derived from census data, so this data can be considered as reasonably accurate which consists of number of auto users/drivers, passengers, transit users, pedestrians, bike users, motor cyclists, taxi cab users and others. Hence, perhaps there may exist a confounding variable involved and therefore, further research would he needed to verify the cause of this association, considering additional factors. 4.2.2 Rural Models The development of negative binomial macro-level CPMs for the rural areas of Kelowna was carried out for the 227 rural zones using the method and variables discussed earlier. The resulting Kelowna models for the rural class have been presented in Table 4.2 along with descriptive statistics and GVRD CPMs for comparison. 64 Table 4.2: Rural Models: GVRD vs. Kelowna Model Pearson 2 0.0.5. Group # Model Form K 2 SD 2’ t-Statistics dof 3 Rural, Modeled, Exposure E’cisting G VRD Model: tconto,,t —2 0 6478 2 868vc 2.4 67 87 101Total Collisions/3yr = O.32368VKT e ti’ktI0, tvc4 Kelowna Model ( using GVRD model form but 2003 Kelowna data): 0 487 0 242c 0.664 239 213 210 ‘constant 1.79Total Collisions / 3yr 1 .84VKT e tvk/ 6.42,tvc-0.5 4 Rural, Measured, Exposure Existing GVRD Model: Total Collisions / 3yr 1 .92TLKM° 1.6 94 90 103 tcOfl lion! •-3 tt/k,ii 9 Kelowna Model: ( using GVRD model form but 2003 Kelowna data): 0.64 200 205 206 lcoflsfo,,(J6JTotal Collisions / 3yr = 1 6.1 TLKIvI 0.282 tlkm 2.64 65 Model Pearson 2 Group # Model Form K SD 2’ t-Statistics O.0.. dof 7 Rural, Modeled, S-I) Existing G VRD Model: To/al Collisions / 3 yrs O.3 1 1 VKT°6344e2409’°3529nhd) 2.7 67 88 98 tnhd 2 Kelowna Model ( using GVRD model form but 2003 Kelowna data): Total Collisions / 3 yrs I .376VKT°858 0.88 200 201 202 1.01 (ik. 5.6. Iic /55 tnhd 5.62 8 Rural, Measured, S-I) Existing G VRD Model: Total Collision 3vrs=O.05 546LKi337et I 0h21 7nhI) 2.0 78 100 tCOflVl((flf 4 88 ti/k,,,- 9. tune,,,p3 Kelowna Model: ( using GVRD model form but 2003 Kelowna data): Total Collision 3 yrs= 5.22TLKA717 334hd) I .0 197 201 200 ‘cons/am 8.66 1tlk,n 7.82 t,(nen,p. 1.52 In/a— -1.34 66 Model Pearson 2 Group # Model Form ic 2 SD t-Statistics %‘ 0.05, dof 11 Rural, Modeled, Network Existing GVRD Model: Total Collisions/3yrs=O.31 89VKT°654. 2.7 78 89 100 tconstant2 (1 .887vc +93.76sigd) lvkI10, tic2 e Kelowna Model ( using GVRD model form but 2003 Kelowna data): Toial Collisions /3 yrs 2.46VKT°3 28 0.96 181 214 212 ls(fl/-3 13 (—0.058vc+2&5sigd) hAt 5.42. ti’c-0.I5 e Isigd- 929 12 Rural, Measured, Network Existing G VRD Model: Total Collisionsl3yrs=O.4456TLKM131 2.8 77 85 98 !,mIo ts,g (193.7 sigd + 0.0100 ialp — 0.0263 llkp) t,(,J/)7 t//kp5 e Kelowna Model: ( using GVRD model form but 2003 Kelowna data): 1.05 160 223 219 7 Total Collisions / 3 yrs =1 1 .39TLKM 0078 (30.2sigd +0.OO78iaIp —O.004611kp) ligd 966 e liaIp28, I/lAp- 00046 67 Model I 1 Pearson 2 2 SD 2’ t-StatisticsGroup # Model Form K 005 dof 15 Rural, Modeled, TDM Existing GVRD Model: ‘conc/ant--3 Total Collisions /3yrs=O.303613VKT°642° 2.6 67 88 tikj)0 (jL3 haL 2(2.574vc±0.l66ltcd) ‘e Kelowna Model ( using GVRD model form but 2003 Kelowna data): tcomiant - I 60.87 187 201 201 ‘i-ki• 5.49. !ij ITotal Collisions /3yrs=1.66VKT°794 t,cd34I (0.4733vc+0.1 lO3tcd) e 16 Rural, Measured, TDM Existing G [‘RD Model: —O.OO4986c,y 1.8 86 89 tcons1ant-I /1Thial Collisions / 3 yrs 0.16251 3TLKM’ .496 e ltlkm- /0. tcrp2 Kelowna Model: ( using GVRD model form but 2003 Kelowna data): 0.89 197 205 205 ‘constant 3 Total Collisions / 3yrs =35.93TLKM° 7j98 e_OO2 ‘i/km 7.46. lcrp - 2./I 68 In the rural models, many of the statistical associations that were observed in Table 4.2 revealed that they were in agreement with the previous GVRD model results, which are as follows: o Exposure models: Increased collisions were associated with increases in VKT and TLKM. which was quite intuitive and similar to the urban models. In other words, the more the travel or road-kilometers, the higher the probability of collisions. o S-D models: Increased collisions were associated with increases in average zonal congestion level (VC) and residential unit density (NHD) which is also quite intuitive. o Network models: Increased signal density (SIGD) and percentage of arterial-local intersections (IALP) was associated with increased collisions frequency. However, decreased collisions were associated with increased proportion of local road lane kilometres (LLKP). This was because operating speed is generally lower in local roads compared to arterial or collector roads. Also as local-lane kilometres (LLKP) was found to be the predominant predictor in rural models by Lovegrove’s (2006, 2007) GVRD models, and as there were relatively lower number of local roads in rural areas, hence, higher number of collisions can be expected in the rural areas. o TUM models: Intuitively, increased collisions were associated with increases in total commuter density (TCD). The other TDM variable, Core Area (CRPCORE/AR, %), had an inverse association with collision confirming earlier findings by Lovegrove (2006, 2007). However, from the statistical associations found in Table 4.2, the effect of some of the variables on collisions was found to be inconsistent with previous GVRD model results which are as follows: o In the Measured SD model, the association of unemployment (UNEMP) with collision was not in accordance with the GVRD rural SD model. Also the I statistic of this variable was not significant at 95% level of confidence. 69 o In the Modeled Network model, the inverse association of VC i.e. congestion with total collision appeared to be counter intuitive, yet again due to poor quality of VC output from TMODEL72. 4.3 Transferred Models For transferring macro-level GVRD CPMs to Kelowna, the maximum likelihood approach was used to calibrate both the shape parameter and the constant term of the transferred model, as was discussed earlier in chapter three. Using the “OFFSET” facility in GLIM4 and the Kelowna 2003 dataset, the 1996 GVRD models were recalibrated in an attempt to see whether or how well the models could be transferred to Kelowna. As part of the GLIM4 calibration process, only the constant term and the shape parameter of the transferred models were given a new value, the other parameters remaining the same. The following sections provide a detailed description of the results for the transferred models; broken down into urban and rural classes respectively. 4.3.1 Urban Models Following the proposed transferability guidelines in Chapter three, transferability tests were done to see how well the GVRD’s 479 urban zones can be transferred to Kelownas 145 urban zones. Table 4.3 summarizes the results for the urban group, including transferred CPMs from GVRD to Kelowna using GVRD model forms but Kelowna 2003 data along with CPMs developed for the GVRD by Lovegrove et al (2006, 2007) using 1996 data for comparison. 70 Table 4.3: Transferability Results for the Kelowna Urban Models Model Pearson 2 2 SD t-StatisticsGroup # Model Form K 0 05, & z scoredof I Urban, Modeled, Exposure Existing G VRD Model: 0685 I.45cTotal Collisions / 3yr = 1 .1 5VKT e 1.7 495 508 510 tCQflStflfO4 tvkt 13 t15 Transferred Model (using GVRD coefficients and form, but Kelowna 2003 data): 1.08 163 158 165Total Collisions/3yr = O.592VKT°685e’45 z=o.0158 2 Urban, Measured, Exposure Existing GVRD Model: 1 .20 470 530 518 tconsfaJI(I8Total Collisions/3yr = 92.48TLKM°432’ ttlkm 7 Transferred model: (using GVRD coefficients and form, but Kelowna 2003 data): 2.05 128 137 153Total Collisions/3yr = 28.69TLKM°432 z=-o.013 71 Model I Pearson 2 2 SD 2’ t-StatisticsGroup # Model Form K 0s35 & z score(‘Of 5 Urban, Modeled, SD Existing GVRD Mode!: Tolal Collisions / 3 yrs = 2.0 461 500 505 ‘constant I / 1 .822 VKT 0.8188 (0 853 ye + 0,00401 wkgd + 0.004924 popd —0 5359 Is’) /6. t1 .3e tivkgc/-3, lpopd 4 lfc4 Tran.sferred Mode! (using GVRD coefficients and form, but Kelowna 2003 data): ‘consiant9. /Total Collisions /3 yrs = 1.85 143 145 158 z=0.004 e7.85 VKT 0,819 (0.853 vc.’ + 0 00401 w/ ÷ 0.00492 popd — 0.5359 Is) 6 Urban, Measured, SD Existing GYRD Model: Total Collisions / 3 yrs .60 508 518 514 ‘constant c 0.8218 (0.007462 popd + 0.06295 unernp — 0 743 jv)74 .21 75 TLKM e It/tao /2. ipO/si 6 Iunetnp-7. 135 Transferred model: (using GVRD coefficients and form, but Kelowna 2003 data): iota! Collisions / 3 yrs = 2.09 162 145 161 0n5tchht 63.3 49. I2TLKM 0.8218 e(000462 popd 0.06295 unemp 0.743 Is) Z =0.032 72 Model Pearson 2 Group # Model Form K SD t-Statistics ,{ 0.05, & Z score 9 Urban, Modeled, Network Existing GVRD Model: Total Collisions / 3 yrs = 0.90626VKT°7851 (2.399sigd +0.7947 mtd — 0.0221 3i3wp) e 2.4 485 505 515 t11,02 tvk! 19. lsigd4 ‘mid 6. li3u’p -7 Transferred Model (using GVRD coefficients and form, but Kelowna 2003 data): Total Collisions/3 vrs=1 .94VKT°7851 tconsiw7i- 7.29 0.92 152 159 163 Z=00013(2.399 sigd + 0.7947 nit d — 0 02213 13 up e 10 Urban, Measured, Network En sting G [‘RD Model: Total Collisions/3yrs=29.9342TLKM°8675 1.9 511 511 514 tcflflcfafl/ II (4.748sigd — O.0204i3wp -I- 0.007193a1kp) tiIk,n 1-I. ‘sigd 6 e ta1kp3. Transferred model: (using GVRD coefficients and form, but Kelowna 2003 data): 2.07 127 139 154 tco,i.,,i,634Total Collisions / 3 yrs = 53 .9TLKM 08675 Z =-O.098 (4.748sigd —0.0204i3wp +0.007193a1kp) e 73 Model I Pearson 2 2 SD 2’ t-StatisticsGroup # Model Form IC 0.05, & z scoredof 13 Urban, Modeled, TDM Existing G YRD Model: Total Collisions / 3 yrs 1 .630520 6887 1.9 483 510 513 t0,717 1.4 ( 0.07924scvc — 0.0000207 core+ 0.00000091 2drive) tf 15. ‘scic - 9 e tcoye -.5, Transferred Model (using GVRD coefficients and form, but Kelowna 2003 data): 0.6887Total Collisions /3 yrs=20.73VKT 1 .1 8 1 76 157 164 7 — 0.00002O7core ± 0.00000091 2drive) Z =0 14 Urban, Measured, TDM Existing G VRD Mode!: Total Collisions / 3 yrs 43.7285TLK.lvf°5 62 .5 484 517 !ç(flc(fl/.. 14 ( 0.02702scc— 0.0000277core+0.000 I 231cm) ‘sec- 7. ‘core -6 e ‘Ic-rn Transferred mode!: (using GVRD coefficients and form, but Kelowna 2003 data): Total Collisions / 3 yrs 51 . 1TLKM°5762 1.17 228 146 155 ‘concluiflt 278(0.02702scc—0 0000277core+0.00023tcm) e From the results in Table 4.3, a number of outcomes have been observed as follows: o All “z” statistics were close to zero suggesting that all GVRD urban models were transferable to Kelowna. This “z” score value served as a basis for accepting 74 successful transferability as described by Sawalha and Sayed (2005b) and Lovegrove (2006, 2007). o However, the statistical tests with a 95% level of confidence were not met for all transferred models. The urban Modeled and Measured TDM CPMs both exceeded target Chi-square values, casting doubt on the quality of TDM input data. The subjective determination of SCC and poor output of VC data from TMODElTl2 while calculating SCVC in TDM group may result this type of poor fit model. o The K values in the transferred models were lower than in the original GVRD models for the following groups • Exposure Modeled (Model Group #1) • SD modeled ( Model Group #5) • Network modeled ( Model Group # 9) • TDM modeled (Model Group #13) • TDM measured (Model Group #14) The K values in the transferred models were higher than in the original GVRD models for the following groups: • Exposure measured ( Model Group #2) • SD measured (Model Group #6) • Network measured ( Model Group #10) A transferred model generally forces the fit of predefined variables, coefficients, and model forms to a new data set. So intuitively, the new K value of the transferred model should be smaller (i.e. indicating higher variability) than the original model from which it was transferred from, assuming the same number of data points are involved. If the same model is transferred to two different data sets, one being smaller and the other one larger, then the new K should be larger for the smaller data set than for the larger data set (Lovegrove, 2006, 2007). Having said this, it is unclear how the new transferred K compares to the original K value if going to a smaller data set. On one hand, the new K should be smaller due to the force fit. But on the other hand, if it is going to a smaller data set, could be larger. So, the nature of the K 75 value depends on the size and quality of data. Hence, the lower K values in the modeled exposure and SD models may be due to the poor quality of VC variables included in it. Similarly. K values for the modeled and measured TDM models were also lower, casting doubt on the quality of TDM input data, especially due to subjective CORE and SCC variable in the TDM model group as described earlier. However, the higher ic values in the measured exposure, SD and network models suggested that, measured input data was much better than the modeled input data. This higher K value may also be due to a smaller data set for Kelowna (479 GVRD urban data versus 145 Kelowna urban data), so intuitively less scatter/dispersion could be expected that resulted in a higher K value, o All sample t-values for the new a0’s in the recalibrated transferred models vere well above the t-test statistic of 1 .96 at 95% confidence level, o The values for the modeled model shape parameters, K, averaged as 1 .26 whereas the values for the measured model shape parameter averaged as 1 .85. This again suggested that measured models were better fit models (less dispersion) relative to the modeled models, upon transfer. This result is also in support of the result found by Lovegrove (2006, 2007) suggesting that TLKM can be used as a measured exposure variable rather than the requirements for TMODELJM or other complex transportation modeling resources to produce macro-level CPMs, o All the near-zero z values found here were much lower than those found by Sawaiha & Sayed (2005b) when transferring micro-level CPMs between Vancouver and Richmond. These improved z values are quite encouraging for macro-level CPM transferability, especially since completely two different datasets were used in two different geographic space and time. 4.3.2 Rural Models For transferring the GVRD rural models to Kelowna, the same approach were used i.e. the maximum likelihood approach using GVRD’s 93 rural zones as compared to Kelowna’s 227 rural zones. The resulting transferred models are shown in Table 4.4 including CPMs developed for the GVRD rural class by Lovegrove (2006. 2007) using 1 996 data; and 76 transferred CPMs from GVRD to Kelowna using GVRD rural model forms but Kelowna 2003 data for comparison. Table 4.4: Transferability Results for Kelowna Rural Models Model I Pearson 2 Group # Model Form K 2 SD 2’ t-Statistics2’ 0.05, & z scorec/of 3 Rural, Modeled, Exposure Existing G VRD Model: 0 6478 2.868” 2.4 67 87 101 tconslani “-2Total Collisions/3yr = 0.32368VKT e t:kr /0, tvc4 Transferred Model (using GVRD coefficients and form, but Kelowna 2003 data): 0.46 286 221 212 tconstant753 0 64’8 2.868,’c Z —0.002Total Collisions I3yr = 0.43 VKT e 4 Rural, Measured, Exposure Existing GVRD Model: TOtal Collisions/3yr = 1.92TLKM° ‘ 1.6 94 90 103 t//knr 9 Transferred model: (using GVRD coefficients and form, but Kelowna 2003 data): Total Collisions/3yr = 14.1 1TLKM° 549 0.76 192 206 207 ‘cs/c,nt29.95 Z = 0.415 77 Model Model Form 2 2’ t-StatisticsICGroup # Pearson 2 X SD 0.05.do! & Z score 7 Rural, Modeled, S-D Exisling GVRD Model: 2.7 67 88 98 Total Collisions/3yrsO.3 1 1 VKT°6344 (2 4O9c +0 3529nhc1) • e tvkt- 10. tnhd 2 Transferred Model (using GVRD coefficients and form, but Kelowna 2003 data): Total Collisions / 3 yr. O.225VKT° 6344 (2,409c+0.3529nh4) 0.61 230 209 205 4 72 • e Z=-0.002 8 Rural, Measured, S-D Eüsting GVRD Model: e 2.0 ‘COnS/ant --4To/al Collision3yrs=O.O5 546’LKA4 7nhc/) 78 10088 t1km) iunenq) 3 tnlid 4 Transfèrred model: (using GVRD coefficients and form, but Kelowna 2003 data): Total Col/isions!3yrs=6.98TLKA3 7. (oi73&inrrnj>O62I7nhd) 0.83 200 202 203 conStant II e z=0 78 Model Pearson 2 Group # Model Form 2 SD t-Statistics% 005, & z scoredol 11 Rural, Modeled, Network Evisting GVRD Model: tcoflaani -2 Total Coilisions/3yrs=O.3189VKT°64. 2.7 78 89 100 t1k, , 2 (1.887vc+93.76sigd) tsigd3 e Transferred Model (using GVRD coefficients and form, but Kelowna 2003 data): Total Collisions / 3 yrs = 4.1 3VKT°654 0.94 180 214 214 tcons(a,lp 965(1 .887vc +93.76sigd) Z =0 e 12 Rural, Measured, Network Evisting GVRD Model: To/al Collisions / 3 yrs 0.445 6TLKM1 16 2.8 77 98 tCOflStUfl! -1.4(193.7sigd+0.OlOOia/p —0.026311kp) 85 e tpk,ii Jo. ‘sd- 5 (ia/p -2. ‘I/kp -5 Transferred model: (using GVRD coefficients and form, but Kelowna 2003 data): 1 .00 157 223 222 ‘COflStafli 33.2Total Collisions /3yrs=l 3.73TLKM’1° Z =0 (193.7 sigd+O.OlOOialp —0.026311kp) e 79 Model Group # Model Form Evisting GVRD Model: Total Collisions / 3 yrs = 0.303613 VKT 0.6429 (2.574 ye +0.1661 ted) Transferred Model (using GVRD coefficients and form, but Kelowna 2003 data): Total Collisions /3 yrs = 0.258VKT°6429 (2.574vc+0.l66lted) tconsiani-I2.9 z =-O.003 Existing GVRD Mode!: —O.OO4986rpTotal Collisions / 3 yrs = 0.1 6251 3TLKM .496 e Transferred model: (using GVRD coefficients and form, but Kelowna 2003 data): .496 —O.004986crpTotal Collisions /3 yrs = 3.5 3TLKM e tCO,Iiiafl( .. I/ 4 Z =0.044 From these results, the following observations can be made for the transferability results from GVRD rural models to Kelowna: o All the “z” statistics were close to zero suggesting that all rural models were transferred successfully as well, 15 Rural, Modeled, TDM bI 88 tconsiani-3 tk,. JO. ‘vc3 ltcd:2 0.57 241 209 16 Rural, Measured, TDM 204 tconslan!--1. II /0. -2 1.8 0.70 86 251 89 206 207 80 o However, the goodness of fit test at 95% level of confidence was not met for some of the models which are: • Exposure modeled model (Model group #3) • S-D modeled model(Model group #7) • TDM modeled model(Model group #15) • TDM measured model(Model group #16) This was likely because of the poor quality VC output from TMODEL”12 in exposure, SD and TDM modeled model. Regarding TDM measured model, the possible reason of the poor fit may be the subjective determination of the TDM variable, Core Area (CRPCORE/AR, %). o The new K values of the transferred models were lower than in the original GVRD models for all the rural model groups. This may be due to a larger dataset for the rural class (227 rural zones in Kelowna compared to 93 rural zones in GVRD). Therefore. intuitively higher scatter/dispersion could be expected that resulted in a lower k value. This lower k values also suggests that rural transferred models were relatively worse fit models than the urban transferred models, o All sample t-values for the new a0’s were well above the t-test statistic of 1.96 at a 95% level of confidence, o The average value for the modeled model shape parameters K is 0.645 whereas this value for the measured model shape parameter is 0.855, yet again suggesting that measured models were better fit models then the modeled models, thus signifying the use of TLKM as a measured exposure variable. o All the near-zero z values for the rural models were lower than those found by Sawalha & Sayed (2005b) when transferring micro-level CPMs between Vancouver and Richmond, yet again indicating results that are encouraging for macro-level CPM transferability. 81 4.4 Transferred Vs Developed Models The results of the transferability tests provided enough evidence that the GVRD 1996 models were transferable to Kelowna in the year 2003. Hence, macro-level CPM transferability across different spatial-temporal regions was appeared to be feasible and successful. However, the new Kelowna models developed directly from Kelowna data showed that the logic (i.e. +7-) of some of the variables was associated with collisions counter intuitively and vas not consistent with GVRD model results. Also, in a large number of models the t statistic of the variables was insignificant at a 95% level of confidence. These may be due to Kelowna’s smaller geographic size, less advanced modeled network, poor output from this model, meaning that data may have been poorer than those of the GVRD. On the other hand, in the case of transferring the models, the transferred models contained the same explanatory variables as those of the GVRD models. The GVRD models are based on a larger data set and could be considered more accurate and better reflects the association between collisions and independent variables and the coefficients of those variables were forced to be the same, only the constant term and the shape parameter was output from GLIM4. Hence, the logic of the parameters remained same as those of the GVRD models. Thus, the 1996 GVRD models which were quite reliable and were based on good quality data and sound methodological background could be applied to Kelowna in a different time period, which is in a completely different geographical region. The results of the transferability study also suggested that the number of data points for model recalibration do not have a significant impact on the “z” score. Transferring the GVRD urban models to Kelowna was done using a larger dataset to a smaller one (479 GVRD vs. 227 Kelowna), whereas this was done for the rural class using a smaller dataset to larger one (93 GVRD vs. 145 Kelowna). The near zero “z” values that resulting from both of the cases recommended that, the issue of transferability is desirable only when a smaller community lacks good quality data, irrespective of number of data points possessed by them. Even if they have sufficient data points available to develop their own models, but those data may not be of good quality that can be considered for model development to reliably predict the safety estimate of that community. This helps explain the benefits of transferring models over developing models from scratch for a smaller community. While macro level CPMs have been successfully transferred from GVRD to 82 Kelowna, the following paragraphs provide a brief summary about which urban and rural transferred models could be used as a reliable model for the city of Kelowna. 4.4.1 Urban Models The results presented in this chapter showed that the “z” values were sufficiently close to zero, suggesting that the GVRD urban models were successfully transferable to Kelowna. However, in a small number of models the Pearson 2 statistical test with a 95% level of confidence was not met. This was especially true for the TDM modeled and measured models. It is worth mentioning that adequate data of sound quality ensures that a specific model will provide precise, reliable and credible results. The subjective determination of SCC variable and poor output of VC variable from TMODELTM while calculating SCVC in TDM model groups may have resulted in this poor fit model. Also, the modal split is quite different in Kelowna relative to GVRD, having much lower transit usage but more walking and biking due to proximity to CBD. Thus for the transferability test to be successful, the conditions in the two concerned study areas should be “similar enough” to justify transferring a model from one to another. Hence additional caution needs to be exercised while Lising the transferred TDM CPMs. The higher K values of the measured models suggested that measured models are better fit models compared to the modeled models when transferred because of better data quality. Therefore, it is possible to use TLKM as a measured exposure variable rather than the requirements for VKT and VC variables from TMODEL TM or other complex transportation modeling resources. 4.4.2 Rural Models The near-zero ‘z” statistics while transferring the rural models suggested that all models were transferred successfully. Nevertheless, Pearson 2 statistical test with a 95% level of confidence was not met in the exposure, S-D modeled models and TDM models due to poor 83 data quality. However, the “z” value for this model was zero to two significant figure (z=0.002) suggesting that model was transferred successfully anyhow. In this case also, the greater values of the modeled model shape parameter, K, over measured models suggested that measured models are better fit models then the modeled models. While the values of K for transferred rural models were lower than those of the GVRD rural models, these values were higher compared to GVRD urban models. Therefore, it appeared that this lower rural K value was due to transferring from a smaller dataset to a larger one (GVRD’s 93 vs. Kelowna’s 227), reflecting a larger sample size. Similarly, the higher Kvalue for the urban class was likely due to transferring larger dataset to a smaller one (GVRD’s 479 vs. Kelowna’s 145), reflecting a smaller sample size. 4.5 Summary In this chapter, results have been presented for the macro-level CPMs developed for Kelowna using same model forms used in the GVRD but using 2003 data. Also presented are the Transferred CPMs from the GVRD to Kelowna using GVRD model forms but Kelowna 2003 data. Models have been developed and transferred which predict the level of safety in both urban and rural areas. The results suggested that it was possible to transfer macro level CPM across different geographic region and time period. The results also showed that in a geographical area that is smaller in size and that has a less advanced modeled network, it may be beneficial to recalibrate the model using an existing reliable CPM instead of developing its own model. This was because in some cases, the developed models may not provide a statistically predictive association between traffic safety and neighbourhood characteristics pertaining to traffic exposure, road network, socio-demographic, and transportation demand management. Hence, the results suggested that macro-level CPM transferability was feasible and no more complicated than micro-level CPMs. 84 CHAPTER 5 CONCLUSIONS AND RECOMMENDATIONS 5.1 Introduction This chapter is comprised of three main sections. In Section 5.2, the thesis summary is presented along with the main research conclusions. Section 5.3 highlights the research contribution by justifying how they add to the current state of knowledge in the field of community-based macro-level CPM transferability. Finally, Section 5.4 describes some recommendations for future research topics, which can enhance and strengthen the methodologies described in this thesis. 5.2 Summary & Conclusions Given the enormous social and economic costs of road collisions and the lack of availability and quality collision data, there is a recognized need to confirm methodology for community- based macro-level CPM transferability between different time-space regions for use in proactive road safety improvement programs. Recent research has focused on traditional micro-level CPMs and their transferability, either in different time or in a different geographical region. Less attention has been given so far to both a spatial and temporal transferability of macro-level CPMs. Given the fact that macro-level CPMs can be a reliable empirical tool to address road safety explicitly and be used as part of proactive road safety improvement programs, a confirmed methodology for macro-level CPMs transferability needed to be developed. The reason is that, it is considered more beneficial if CPMs developed for one jurisdiction in one period of time could be applied for a different period in the same or another jurisdiction. And this is particularly important for some communities who do not have sufficient good quality data to develop their own macro-level collision models. 85 This thesis described an approach for conducting macro-level CPM transferability between different spaces and times. The objective of this research was to test whether community- based macro-level collision prediction models could produce reliable safety estimates when transferred between different spatial-temporal regions. In doing so, the GVRD macro-level collision prediction models developed by Lovegrove and Sayed (2006) and Lovegrove (2006, 2007) were used. Using those models and data from the GVRD and the CORD, the transferability test involved recalibration of the 1996 GVRD models to Kelowna, using 2003 data. Following the methodology described by Lovegrove (2006, 2007) on GVRD macro- level CPMs, the first part of this research involved developing macro-level CPMs for the City of Kelowna using 2003 data. The next part of this thesis involved recalibration of the 1 996 GVRD models to see whether the recalibrated models could yield reliable predictions of the level of safety for Kelowna neighbourhoods in the year 2003. Finally, a comparison was made between the results of Kelowna’s own developed models and the transferred models from GVRD to Kelowna to comprehend which models would yield better results. The results of this thesis suggested that macro-level CPM transferability was feasible, and preferable, and no more complicated than micro-level CPMs transferability. 5.3 Research Contributions The following four items represent the main contributions of this research: 5.3.1 Macro-level CPM transferability between time-space regions is feasible and successful. The results of the transferability test provided enough evidence to suggest that the GVRD 1996 models were successfully transferred to Kelowna in the year 2003. The “z” score value which serves as a basis for accepting successful transferability of models were all close to zero, suggesting that all GVRD urban and rural models were transferable. Also, all sample t-values for the new a0’s in the recalibrated transferred models met t-test statistic at 95% confidence level. Moreover, the near-zero “z” values found in this research were much lover than those found by Sawaiha & Sayed (2005b) when transferring micro-level CPMs between Vancouver and Richmond. These improved “Z” values are quite encouraging for macro-level CPM transferability, especially since completely two different datasets were used in two separate space-time regions. 86 5.3.2 It is beneficial to transfer models rather than developing own models for some communities. While developing Kelowna models directly from Kelowna data, the statistical associations of many of the variables mirrored those observed in GVRD models. However, the observed associations of several key variables with collisions for some of the model groups were found to be inconsistent with those of the GVRD model results. Also, in a large number of models the t-statistic of some variables was insignificant at a 95% level of confidence. The possible explanation of these unusual results may be due to the smaller geographic size of Kelowna combined with poor data quality from its less advanced modeled network, which means data may have been poorer than those of the GVRD. On the other hand, while transferring the models, the transferred models contained the same coefficients and explanatory variables as those of the GVRD models, only the lead constant term and the shape parameter was output from GLIM4. Despite this ‘forced’ fit, yet the models had better goodness of fit and t-statistics test results. This helps explain the benefits of transferring models over developing models for a smaller community that dont possess sufficient collision statistics with sound quality that enable the development of reliable collision prediction models. 5.3.3 Transferability is desirable whenever a community lacks good quality data, irrespective of the number of available data points. The results of the transferability study in this thesis suggested that the number of data points for model recalibration do not have a significant impact on the “z” score. Transferring the GVRD urban models to Kelowna was done using a larger dataset to a smaller one, whereas this was done for the rural class using a smaller dataset to larger one. The near zero “z” values resulting in both cases demonstrated that the number of data points is not a limiting factor for model transferability. Moreover, the results of the Kelowna’s own developed “rural” models indicated very poor model fit, despite using larger dataset than the GVRD. Hence, clearly transferability is desirable wherever and whenever data quality is an issue. 5.3.4 Demonstration of the validity of some of the transferred models. While macro level CPMs have been successfully transferred from GVRD to Kelowna, a small number of 87 models didn’t meet the Pearson 2’ statistical test at the desired 95% level of confidence. This was specifically true for the Urban TDM models (both measured and modeled) and Rural Exposure, S-D modeled models and TDM models (both measured and modeled). The subjective determination of the Core Area (CRP) variable and the SCC variable, which also influenced the calculated SCVC variable in TDM model groups combined with poor output of VC data from less advanced planning model TMODELT2 was likely the reason behind this poor fit. Hence additional caution needs to be exercised while using these models as transferred models for Kelowna and further research effort would be needed in this area to improve data quality. 5.3.5 The value of the shape parameter K of the transferred models is strongly influenced by the sample size. While sample size influences are subservient to data quality. sample size does still play an important role. In the case of urban models, when a larger dataset was transferred to a smaller one (GVRD’s 479 TAZs to Kelowna’s 145 TAZs), the IC values for transferred urban models were higher than the original GVRD models, thus reflecting a larger sample size. On the other hand, when a smaller dataset was available for transferring to a larger one (GVRDs 93 TAZs to Kelowna’s TAZs data). the values of K for the transferred rural models were lower compared to the GVRD rural models, thus reflecting a higher sample size. Therefore, sample size is important in macro-level CPM transferability, governing the K value. 5.3.6 GIS can be implemented as a useful tool in safety analysis. GIS provided the opportunity of having increased accuracy of spatial data and ability to perform spatial queries. In this thesis the ‘spatial query’ in ArcGlS 9.0 helped to spatially aggregate collision data according to each zone. GIS also provided tools to combine data and join attributes of data using specific feature as the selection criteria. For example; the zone numbers of each zone was used to combine the population density with TAZs and thus graphically displaying the urban and rural classes according to population density. ARCGIS 9.0 also greatly facilitated the task of extracting the TLKM data, using ‘overlay’ technique and ‘Arctoolbox’ that enabled to “cookie cut” the digital road map and thus associate the sum of each road link 88 that falls in a zone to a particular TAZ. Hence, GIS can be used as a useful tool in the development, management and aggregation of various data involved in the development of’ macro-level CPMs. 5.4 Recommendations for Future Research This section presents some recommendations on future research to enhance and strengthen the methodologies described in this thesis, as well as to address various issues raised. These are as follows: • The definition of the GVRD’s “Urban” and “Rural” TAZs was defined somewhat subjectively, using mostly land use mapping, site visits and personal knowledge of the region. However, for the City of Kelowna, this area classification was based on an absolute population density (pop/km2)measure. Therefore, future research works is needed on ways to define the urban-rural class in a consistent and reliable manner. • For valuation of core size (CORE), shortcut capacity (SCC) and shortcut attractiveness (SCVC) variables, manual extraction was done due to the unavailability of automated techniques. This manual extraction was a very time consuming and somewhat subjective process, and may have influenced model fits. Hence, it is recommended that additional research be done in this area so that this manual and time consuming data extraction process could be shortened and improved using automated GIS extraction techniques. • The modeled exposure data (VKT, VC, SCVC) was extracted using T”vIODEL”2, ESRI ArcGIS 9.0 and MS Excel. Thus this process was quite complex and multi platform, involving several software packages and manual interventions to integrate the dataset. Also, the unavailability of a geo-referenced road network of the modeled roads which were output from TMODELM2, posed a significant barrier and made it difficult to associate the modeled road network with the geo-coded TAZ map. A digital version of modeled road network with known spatial reference if available 89 would make the overall process much more accurate and would lessen the time and effort to a minimum level. • Overall, almost every part of the data extraction and aggregation processes of this research were based on several complex, multi-platform and time consuming methods, which includes manual (i.e. using traditional spreadsheet), modelled (i.e. TMODEL T1) and automated (i.e. GI S) techniques. Development of wel I-documented and standardized automated processes is worthwhile to aid data quality and assembly and, thereby reducing the required time and effort. Thus in summary, an improved level of data quality with consistent and automated data extraction and aggregation process is recommended for future research on community-based macro-level CPMs that would facilitate the continuing progress on proactive road safety improvement programs. 90 REFERENCES Atherton, T.J., Ben-Akiva. M.E., (1976). ‘Transferability and updating of disaggregation travel demand models”. Transport. Res. Rec. 610, 12—18. Ben-Akiva, M., (1981). “Issues in transferring and updating travel-behaviour models”. In: Stopher, P.R., et al. (Eds.), New Horizons in Travel Behaviour. Lexington Books, Lexington. MA, pp. 665—686. Bonneson, J.A., and McCoy, P.T. (1993). “Estimation of safety at two-way stop-controlled intersections on rural highways”, Transportation Research Record 1401, Journal of the Transportation Research Board (TRB), TRB, National Research Council, Washington, D.C., pp. 83-89. Census Canada (2003), Statistics from the 2003 Census, Government of Canada, Ottawa, Canada. de Leur, Paul, and Sayed, Tarek (2003). “A framework to proactively consider road safety within the road planning process”. Canadian Journal of Civil Engineering, Volume 30(4), National Research Council, Canada, pp. 711-719. de Leur, Paul, and Sayed, Tarek (2002). “Development of a Road Safety Risk Index”, Transportation Research Record 1784, Washington, D.C., pp. 33 — 42. Hadayeghi, Alireza, Shalaby, Amer S., Persaud, BhagwantN., and Cheung Carl (2005). “Temporal transferability and updating of zonal level accident prediction models”, Accident Analysis & Prevention, 38, Elsevier Ltd, Amsterdam, The Netherlands. Hadayeghi, Alireza, Shalaby, Amer S., and Persaud, Bhagwant N. (2003). Macro-Level Accident Prediction Models for Evaluating the Safety of Urban Transportation Systems”, 91 Presented at Transportation Research Board Annual Meeting, January, TRB, Washington, D.C. Harwood, D. W.. Council, F.M., Haur, E.. et al., (2000) “Prediction of the expected safety performance of rural two-lane highways, FHWA-RD-99-207, US Department of Transportation. Hauer. E., Ng. J.C.N.. and Lovell, J. (1988). “Estimation of Safety at Signalized Intersections”, Transportation Research Record 1185, Journal of the TRB, TRB, Washington, D.C., pp. 48-61. Herbel, S.B. (2004). “Planning It Safe to Prevent Traffic Deaths and Injury”, North Jersey Planning Authority Inc., (http://www.njtpa.org’planning/rtp2030/safetv st1v/safety study documents/safety article.pdfl, pp. 7-27. Ho, Geoffrey, and Guarnaschelli, Marco (1998). “Developing a Road Safety Module for the Regional Transportation Model, Technical Memorandum One: Framework”, ICBC, December, Vancouver, Canada. Kmet. Leanne. Brasher, Penny, and Macarthur, Cohn (2003). ‘A small area study of motor vehicle fatalities in Alberta, Canada”, Accident Analysis & Prevention, Volume 35, Elsevier Ltd, Amsterdam, The Netherlands, pp. 177-182. Kulmala, R.(1995). “Safety at rural three- and four-arm junctions: Development and application of accident prediction models”, Dissertation for the degree of Doctor of Technology, Technical Research Centre of Finland, VTT Publication 233, Espoo, Finland. Ladron de Guevara, Felipe. Washington, Simon P., and Oh, Jutaek (2004). “Forecasting Crashes at the Planning Level: A Simultaneous Negative Binomial Crash Model Applied in Tucson, Arizona”, Presented at Transportation Research Board 2004 Annual Meeting. January, TRB, Washington, D.C. 92 LaScala, Elizabeth A., Gruenewald, Paul J., and Johnson, Fred W. (2003). “An ecological study of the locations of schools and child pedestrian injury collisions”, Accident Analysis & Prevention, Volume 36, Elsevier Ltd. Amsterdam, The Netherlands,, pp. 569-576. Levine, N., Kim, K.E., and and Nitz, L.H. (1995). “Spatial Analysis of Honolulu Motor Vehicle Crashes: II. Zonal Generators,” Accident Analysis & Prevention, 27(5), Elsevier Ltd, Amsterdam, The Netherlands, pp. 675-685. Lord. Dominique, and Persaud, Bhagwant N. (2004). “Estimating the safety performance of urban road transportation networks”, Accident Analysis & Prevention, 36, Elsevier Ltd, Amsterdam, The Netherlands,, pp. 609-620. Lord, D. (2000). “The Prediction of Accidents on Digital Networks: Characteristics and Issues Related to the Application of Accident Prediction Models”. Ph.D. thesis, Department of Civil Engineering, University of Toronto, 2000. Lovegrove, G., (2007). “Road Safety Planning, New Tools for Sustainable Road Safety and Community Development”, VDM Verlag Dr. Muller, Burlin Germany. Lovegrove, G., (2006). “Macro-Level Collision Prediction Models for Evaluating Neighbourhood Traffic Safety”, PhD Thesis, University of British Columbia, Vancouver, B.C., Canada Lovegrove, G., Sayed, T., (2006). “Macro-Level Collision Prediction Models for Evaluating Neighbourhood Traffic Safety”. Canadian Journal of Civil Engineering, 33(5): 609—62!, NRC, Canada. Maiou. S.P. (1996). “Measuring the Goodness-of-Fit of Accident Prediction Models”, Report No. FHWA-RD-96-040, Federal Highway Administration, McLean, VA. 93 McCullagh, P., and Nelder, J.A. (1989). ‘Generalized Linear Models”, Chapman and Hall, New York. Miaou, S., and Lum, H. (1993). “Modelling vehicle accident and highway geometric design relationships”, Accident Analysis & Prevention, 25(6). Elsevier Ltd. Amsterdam. The Netherlands, pp. 689-709. Numerical Algorithms Group (NAG). (1996). “The GLIM System, Release 4 Manual”, Royal Statistical Society, Oxford, Great Britain. Lovegrove, G.. and Stanos, J. (2006). “OKanagan Valley Quest Model Transportation Database Technical Report”. Okanagan, Kelowna, 2006. Persaud, B.N.. Lord. D., and Palmisano. J. (2002). “Calibration and Transferability of Accident Prediction Models for Urban Intersections”, Transportation Research Record 1 784, Journal of the TRB. TRB, Washington, D.C., pp. 57-64. Poppe, F. (1997a). “Traffic Models: Inner Areas and Road Dangers”, SWOV Report No. R 97-1 0, Leidschendam, Netherlands. Poppe, Frank (1 997b). “Sustainably safe’ traffic system and accessibility: a pilot project for the Central Netherlands”, SWOV Report R-97-40, Leidschendam, Netherlands. Poppe, F. (1995). “Risk Figures in the Traffic and Transport Evaluation Module (EVV): A Contribution to the Definition Study Traffic Safety in EVV”. SWOV Report No. R-95-21, Leidschendam, Netherlands. Sawalha, Ziad, and Sayed, Tarek (2005a). “Traffic Accident Modeling: Some Statistical Issues”, (in publication) CSCE Journal, CSCE, Canada. Sawaiha, Ziad, and Sayed. Tarek (2005b). “Transferability of accident prediction modeIs. (in publication) Safety Science, Elsevier Ltd 94 Sawalha, Z., and Sayed T., (2001). “Evaluating Safety of Urban Arterial Roadways”, Journal of Transportation Engineering, 127(2), ASCE, USA, March/April. pp 15 1-158. Sawalha, Z., and Sayed, T., (1999). “Accident Prediction and Safety Planning Models for Urban Arterial Roadways”, Draft Report for the Insurance Corporation of British Columbia by the Department of Civil Engineering, University of British Columbia, Vancouver, Canada. Sayed, T., and de Leur, P. (2001). “Forecasting Traffic Safety”, Proceedings from the Fourth International Conference on Accident investigation, Reconstruction, Interpretation and the Law”. hosted by the Civil Engineering Department at the University of British Columbia, August 13-16, 2001, Vancouver, Canada, pp. 187-196. United Nations Road Safety Collaboration (2007). “Improving Global Road Safety”, WHO, August 2007. Van Schagen, Ingrid, and Theo Janssen (2000). “Managing Road Transport Risks: Sustainable Safety in the Netherlands”, Risk Management in Transport, IATSS Research, volume 24(2), publisher, place, country, pp. 1 8-27. Van Minnen, J. (1999). “The suitable size of residential areas: a theoretical study with testing to practical experiences”, SWOV Report No. R-99-25, Leidschendam, Netherlands. Vogt, A., Bared, J., (1998), “Accident models for two lane rural segments and intersections”, Transportation Research Record, 1635, 18-29. Wegman, Fred, (1996). “Sustainable Safety in the Netherlands,” SWOV Institute for Road Safety Research. Leidschendam, The Netherlands. 95 APPENDICES APPENDIX A. GLIM4 OUTPUT SAMPLE FOR MODEL DEVELOPMENT For Model Development in Group 1: Urban, modeled, Exposure [o] GLIM 4, update 9 for Microsoft Windows Intel on 21 Jan 2007 at 15:14:47 [oj (copyright) 1992 Royal Statistical Society, London [ij ? Sinput ‘ukexpi .txt’$ [e] $C Model development: Urban models for Expi for Kelowna$ [el $C Crash variable=T3$ [e] $C Urban zones of kelowna has 138 lines of data$ [e] $Units 138$ [e] $data Zone T3 VKT VC$ [e] $Dinput ‘ukexpi .prn’$ [e] 1 2 208 0.401 [ej 5 0 223 0.737 (Remaininãata units have been removed from here, but are still in original log file) [e] 370 47 322 0.578 [e] 371 47 86 0.269 [ci $lnput ‘macrolib’ negbin$ [e] $calculate LVKT=%log(VKT)$ [ej $C MODEL: EXP - modelled$ [e] $YVar T3 $Error P $Link L$ [e] $Fit LVKT+VC$ [01 scaled deviance = 3573.5 at cycle 4 [o] residual df= 135 [ej $number theta=0 $:method=l $use negbin theta method $display D E$ [wi -- model changed [wj -- model changed [01 scaled deviance = 1 53.23 (change -3420.) at cycle 2 96 [oj residual df = 1 35 (change = 0) [oJ ML Estimate of THETA 1.828 [0] Std Error = ( 0.2227) [o] NOTE: standard errors of fixed effects do not [01 take account of the estimation of THETA [o] 2 x Log-likelihood = 50216. on 135 df [o] 2 x Full Log-likelihood = -1314. [o] Scaled deviance is 1 53. on 1 35 d.f. from 138 observations [o] change is -3420. for 0 d.f. [o] estimate s.c. parameter [o] 1 0.6786 0.3456 1 [oj 2 0.6361 0.07040 LVKT [o] 3 -1.183 0.3178 VC [oj scale parameter 1 .000 [e] $Extract %PE %SE$ [e] $Calculate tstata=%PE/%SE$ [e] $Calculate Target=%CHD(0.95, %DF)$ [e] $Calcu late Chi2=%X2$ [e] $Look %PE %SE tstata Chi2 Target$ [oj %PE %SE TSTATA CHI2 TARGET [o] 1 0.6786 0.34562 1.963 129.6 163.1 [o] 2 0.6361 0.07040 9.036 129.6 163.1 [o] 3 -1.1829 0.31783 -3.722 129.6 163.1 [ci $Calculate ao=%exp(%PE)$ [e] $Look ao$ [oj AO [oJ 1 1.9711 [oj 2 1.8891 [01 3 0.3064 [ej $Display c T M$ [o] correlations between parameter estimates 97 [oj 1 1 .0000 [o] 2 -0.9327 1.0000 [o] 3 0.3430 -0.6259 1.0000 [01 1 2 3 [oj working triangle [oj 1 240.4 [o] 2 5.795 331.8 [oj 3 0.4304 0.1386 9.899 [o] 4 3.856 0.4721 -1.183 129.2 [oj I 2 3 4 [o] Current model: [o] number of observations in model is 138 [o] y-variate T3 [oj weight * [01 offset * [01 probability distribution is defined via the macro NB_FIT. [oJ link function is defined via the macro NB_LINK. [0] scale parameter is I .000 [o] linear model: [o] terms: 1--LVKT+VC [ci $Extract %CD$ [e] $Look Zone %CD$ [01 ZONE %CD [o] 1 1.000 0.0045093140 [oj 2 5.000 0.0183135550 (Remaining outliers have been removed from here, but are still in original log file) [01 137 370.000 0.0002675313 [o] 138 371.000 0.0082673738 [e] $return$ 98 APPENDIX B. GLIM4 OUTPUT SAMPLE FOR MODEL RECALIBRATION For Recaljbratin Model in Group 1: Urban, Modeled, Exposure [oj ML Estimate of THETA = 1.067 [o] StdError=( 0.1201) [o] NOTE: standard errors of fixed effects do not [o] take account of the estimation of THETA [o} 2 x Log-likelihood = 50134. on 137 df [oj 2 x Full Log-likelihood = -1396. [o] Scaled deviance is 158. on 137 d.f. from 138 observations [o] change is -6968. for 0 d.f. [o] estimate s.e. parameter [oj I -0.5249 0.08397 1 [oj scale parameter 1 .000 [ej $Extract %PE %SE$ [e] SCalculate tstata=%PE/%SE$ [e] SCalculate Target=%CHD(0.95, %DF)$ [ej $Calculate Chi2=%X2$ [eJ $Look %PE %SE tstata Chi2 Target$ [o] %PE %SE TSTATA CHI2 TARGET [o] 1 -0.5249 0.08397 -6.252 162.6 165.3 [eJ $Calculate ao=%exp(%PE)$ [el $Look ao$ [o] AO [o] 1 0.5916 [e] $Display C T MS [o] correlations between parameter estimates [o] 1 1.0000 [oJ [oJ working triangle 99 [o] 1 141.8 [o] 2 -0.5249 162.5 [oJ 1 2 [oj Current model: [01 number of observations in model is 138 [o] y-variate T3 [01 weight * [o] offset EXPI [o] probability distribution is defined via the macro NB FIT. [01 link function is defined via the macro NB_LINK. [o] scale parameter is 1 .000 [oj linear model: [o] terms: 1 [ej $return$ 100
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Transferability of community-based macro-level collision...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Transferability of community-based macro-level collision prediction models for use in road safety planning… Khondaker, Bidoura 2008
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Transferability of community-based macro-level collision prediction models for use in road safety planning applications |
Creator |
Khondaker, Bidoura |
Publisher | University of British Columbia |
Date Issued | 2008 |
Description | This thesis proposes the methodology and guidelines for community-based macro-level CPM transferability to do road safety planning applications, with models developed in one spatial-temporal region being capable of used in a different spatial-temporal region. In doing this. the macro-level CPMs developed for the Greater Vancouver Regional District (GVRD) by Lovegrove and Sayed (2006, 2007) was used in a model transferability study. Using those models from GVRD and data from Central Okanagan Regional District (CORD), in the Province of British Columbia. Canada. a transferability test has been conducted that involved recalibration of the 1996 GVRD models to Kelowna, in 2003 context. The case study was carried out in three parts. First, macro-level CPMs for the City of Kelowna were developed using 2003 data following the research by GVRD CPM development and use. Next, the 1996 GVRD models were recalibrated to see whether they could yield reliable prediction of the safety estimates for Kelowna, in 2003 context. Finally, a comparison between the results of Kelowna’s own developed models and the transferred models was conducted to determine which models yielded better results. The results of the transferability study revealed that macro-level CPM transferability was possible and no more complicated than micro-level CPM transferability. To facilitate the development of reliable community-based, macro-level collision prediction models, it was recommended that CPMs be transferred rather than developed from scratch whenever and wherever communities lack sufficient data of adequate quality. Therefore, the transferability guidelines in this research, together with their application in the case studies, have been offered as a contribution towards model transferability to do road safety planning applications, with models developed in one spatial-temporal region being capable of used in a different spatial-temporal region. |
Extent | 2381473 bytes |
Subject |
Transferability Macro-level Collision prediction models |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2008-12-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0063093 |
URI | http://hdl.handle.net/2429/2867 |
Degree |
Master of Applied Science - MASc |
Program |
Civil Engineering |
Affiliation |
Applied Science, Faculty of Civil Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2008-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2008_fall_khondaker_bidoura.pdf [ 2.27MB ]
- Metadata
- JSON: 24-1.0063093.json
- JSON-LD: 24-1.0063093-ld.json
- RDF/XML (Pretty): 24-1.0063093-rdf.xml
- RDF/JSON: 24-1.0063093-rdf.json
- Turtle: 24-1.0063093-turtle.txt
- N-Triples: 24-1.0063093-rdf-ntriples.txt
- Original Record: 24-1.0063093-source.json
- Full Text
- 24-1.0063093-fulltext.txt
- Citation
- 24-1.0063093.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0063093/manifest