GRAPH THEORY BASED TRANSIT INDICATORS APPLIED TO RIDERSHIP AND SAFETY MODELS by Liliana Quintero-Cano B.A.Sc.,Universidad de los Andes, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Civil Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2011 © Liliana Quintero-Cano 2011 Abstract Public transportation systems are a fundamental necessity in current times where sustainability and rising safety costs are important concerns to government officials and the general public. Therefore, the design of public transportation systems is an area of great interest for researchers and practitioners. Nonetheless, there is usually little analysis of network properties during transit design and planning. Moreover, due to the lack of empirical tools, there is not much consideration of transit safety at the planning stage . In this research, a study was performed to explore zonal based network properties applied to bus systems. A new technique to measure network connectivity was developed and applied to a real-world transit system, which in addition to the relationship between edges and vertices, incorporated the influence of transit operational factors (i.e. frequency of routes). Additionally, the effect of bus route transfers was analyzed and modeled by adding intermediate walking transfer links between bus stops. The calculated network properties were applied as explanatory variables in the development of macro-level ridership and collision prediction models. The proposed methodology was applied to the Greater Vancouver Regional District (GVRD) public transportation system and its 577 traffic analysis zones. The developed mathematical models include, seven multiple linear regression models which explain transit commuting ridership. The regression models revealed that ridership is positively linked to network characteristics such as coverage, connectivity, complexity and, the local index of transit availability (LITA). In addition, 35 collision prediction models were developed using a Generalized Linear Regression technique, assuming a Negative Binomial error structure. The safety models showed that increased collisions were associated with transit network properties such as: connectivity, coverage, overlapping degree and the LITA. As well, the models revealed a positive relation between collisions and transit physical and operational attributes such as number of routes, frequency of routes, bus density, length of bus route and 3+ priority lanes, among others. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . 1.1 Background . . . . . . 1.2 Objectives . . . . . . 1.3 Thesis Structure . . . . . . . 1 1 2 3 2 Literature Review . . . . . . . . . . . . . . . . 2.1 Characterization of Transit Networks Using Indicators . . . . . . . . . . . . . . . . . . 2.2 Collision Prediction Models . . . . . . . . . . . . Based . . . . . . . . 4 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Characterization of the Greater Vancouver Regional District (GVRD) Metro Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 From Bus Networks to Graphs . . . . . . . . . . . . . . . . . . 3.4 Characterization of the GVRD Bus Network Using Existent Transit Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Proposed Transit Indicators . . . . . . . . . . . . . . . . . . . . 3.6 Application of Transit Indicators to Ridership Models . . . . . . 3.7 Application of Transit Indicators to Collision Prediction Models . 4 32 45 45 56 58 63 66 76 78 iii Table of Contents 4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.1 The Greater Vancouver Regional District (GVRD) Case Study . . 83 4.2 Application of Transit Indicators in Ridership Models in the GVRD Bus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.3 Application of Transit Indicators in Collision Prediction Models in the GVRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5 Conclusions . . . . . . . . . . . . . . . . . . . . 5.1 Summary . . . . . . . . . . . . . . . . . . . 5.2 Contribution . . . . . . . . . . . . . . . . . 5.3 Recommendations for Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 122 127 128 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 iv List of Tables 3.1 3.2 3.3 3.4 3.5 3.6 4.1 4.2 Variables and summary statistics (577 zones) for the Greater Vancouver Regional District (GVRD) . . . . . . . . . . . . . . . . . Existent transit indicators estimated . . . . . . . . . . . . . . . . Calculation of fmax for the example on Figure 3.10 . . . . . . . . Ranking of networks illustrated in figure 3.11 . . . . . . . . . . . Summary of network measurements for the network illustrated in Figure 3.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Possible variables per zone to include in the CPMs . . . . . . . . Basic graph theory indicators for the GVRD metro system . . . . Summary of measurements and connectivity indicators for eight TAZs in the GVRD . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Ranking of the eight TAZs in the GVRD according to various connectivity indicators. . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Summary of transit indicators statistics for the GVRD. Considering only the bus links (without considering walking links) . . . . . . 4.5 Summary of transit indicators statistics for the GVRD. Considering bus and walking links. . . . . . . . . . . . . . . . . . . . . . . . 4.6 Multiple linear regression results (for models including coverage, overlapping degree and one connectivity indicator) for case 1 (only transit links) in the GVRD . . . . . . . . . . . . . . . . . . . . . 4.7 Multiple linear regression results (for models including coverage, overlapping degree and one connectivity indicator) for case 2 (walking transfers between bus stops and transit links) in the GVRD . . 4.8 Multiple linear regression results (for models including the Local Index of Transit Availability) in the GVRD . . . . . . . . . . . . 4.9 CPMs using VKT as exposure for the GVRD. . . . . . . . . . . . 4.10 CPMs using VHKT as exposure in the GVRD . . . . . . . . . . . 4.11 CPMs using TLKM as exposure in the GVRD . . . . . . . . . . . 53 64 69 74 75 81 83 90 91 93 94 109 109 110 116 118 120 v List of Figures 2.1 2.2 2.3 2.4 2.5 “The Seven Bridges of Konigsberg Problem” . . . . . . . . . . . Types of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of calculation of connectivity indicators for various graphs. Nomograph between connectivity indicators α and γ. (Black [1]) Example of calculations of the Average Edge Length η. (Kansky, [13]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Example of calculations of the θ -index. (Kansky [13]) . . . . . . 2.7 Categories of transit indicators. (Vuchic and Musso [20]) . . . . . 2.8 Example of network coverage calculations with overlapping. (Gattuso and Mirello [8]) . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Venn diagram differentiating metro network stations from graph vertices. (Derrible and Kennedy [5]). . . . . . . . . . . . . . . . . 2.10 Example of transformation from metro network to a graph . . . . . . . 2.11 Example of multiple and single edges . . . . . . . . . . . . . . . . . 2.12 Calculation of coverage area for metro networks . . . . . . . . . . . . 2.13 Typical areas: City, Served and Urban within a metro network (Derrible and Kennedy, [5]) . . . . . . . . . . . . . . . . . . . . . . . . . . 2.14 Graph of Complexity β versus Degree of Connectivityγ . (Derrible and Kennedy[5]) . . . . . . . . . . . . . . . . . . . . . . . . . . 2.15 Examples of metro networks according to state . . . . . . . . . . . . 2.16 Graph of Directness versus Structural Connectivityρ . (Derrible and Kennedy[5]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.17 Average line length A versus Number of Stations NS . (Derrible and Kennedy[5]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.18 L.I.T.A. scores for service, frequency and capacity in the GVRD. (Hariwal et al. [18]) . . . . . . . . . . . . . . . . . . . . . . . . . 2.19 Overall L.I.T.A score for the GVRD. (Hariwal et al. [18]) . . . . . 3.1 3.2 Map of the GVRD . . . . . . . . . . . . . . . . . . . . . . . . . Sociodemographic data for the Greater Vancouver Regional District GVRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6 9 9 10 11 18 19 20 21 21 23 23 25 26 27 28 31 32 46 47 vi List of Figures 3.3 General map of the public transportation system in the GVRD. (TransLink Web Page, 2011) . . . . . . . . . . . . . . . . . . . . 3.4 Map of the complete transit system in the GVRD (Translink Web Page, 2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Transit data for the Greater Vancouver Regional District GVRD . 3.6 Bus routes sequenced versus not sequenced in the GVRD . . . . . 3.7 Example of conversion of a bus network into a graph. . . . . . . . 3.8 Example of comparison of connectivity calculations (β and γ) of three networks with different configurations, but the same number of edges E. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Example of comparison of connectivity calculations (β and γ) for two networks with the same configuration but different route frequencies f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Example of a network composed by zones A and B . . . . . . . . 3.11 Example of a network composed by seven zones. . . . . . . . . . 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 GVRD metro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vancouver metro plotted as part of the state graph . . . . . . . . . Vancouver metro plotted as part of the form graph . . . . . . . . . Vancouver metro plotted as part of the structure graph . . . . . . . Some zonal graphs obtained from the Greater Vancouver District Region GVRD . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial distribution of theβ -index in the GVRD . . . . . . . . . . Spatial distribution of theβ -index in the GVRD . . . . . . . . . . Spatial distribution of theγ -index in the GVRD . . . . . . . . . . Spatial distribution of theγ - index in the GVRD . . . . . . . . . Annual commuting transit trips per capita versus proposed complexity in the GVRD . . . . . . . . . . . . . . . . . . . . . . . . Annual commuting transit trips per capita versus proposed complexity in the GVRD . . . . . . . . . . . . . . . . . . . . . . . . Annual commuting transit trips per capita versus structural connectivity in the GVRD . . . . . . . . . . . . . . . . . . . . . . . Annual commuting transit trips per capita versus Local Index of Transit Availability in the GVRD . . . . . . . . . . . . . . . . . . Annual commuting transit trips per capita versus Coverage in the GVRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annual commuting transit trips per capita versus Local Index of Transit Availability in the GVRD . . . . . . . . . . . . . . . . . . 48 49 51 56 62 65 66 69 73 84 86 86 87 89 95 96 97 98 100 101 103 104 106 107 vii Glossary α β β β β (E S ) γ γ γ γ(E S ) δ ι η θ µ π ρ σ τ χ2 A AServed AREA BDEN BT HOV BUSHOV BUSP CPM E Ef EM ES Degree of cyclicity Complexity Proposed Complexity (option 1) Proposed Complexity (option 2) Complexity (only configuration) Degree of connectivity Proposed degree of connectivity (option 1) Proposed degree of connectivity (option 2) Degree of connectivity (only configuration) Maximum number of transfers Iota-index Average edge length Average length per node Cyclomatic number π-index Structural connectivity Coverage Directness Pearson chi- squared Average line length Residential, employment, industrial and institutional area per zone Zonal area Bus stop density Length of bus only and 3+ HOV lanes Length of bus only HOV lanes Percentage of transit commuters Collision Prediction Model Number of edges Normalized by frequency number of edges Number of multiple edges Number of single edges viii Glossary f fAM fmax FS G GLM GVRD ICBC IS LITA M MB NL NL NR NS NST OP L OD PDO3 POP POPD S3 SD TAZ TWOHOV T HRHOV T3 TCM TCPC T KT T LKM V VE VT V KT V HKT W KG W KGA Frequency Frequency of the AM Peak Maximum sum of frequencies Number of far sided bus stops Graph Generalized linear regression modeling Greater Vancouver Regional District Insurance Corporation of British Columbia Length of interstation spacing Local index of transit availability Total mileage of network Number of mid block bus stops Number of metro lines Net length of bus routes Number of bus routes Number of near sided bus stops Number of bus stops Total route length Overlapping degree Property damage only collision in three years Population Population density Severe collisions in three years Scaled deviance Traffic analysis zone Length of 2+ HOV lanes Length of 3+ HOV lanes Total collisions in three years Total commuters Transit commuting trips per capita Total transit kilometers traveled Total lane kilometers Number of vertices Number of end vertices Number of transfer vertices Total transit and vehicle kilometers traveled Total vehicle kilometers traveled Employment Employment density ix Acknowledgments Firstly, I want to thank my supervisor Dr. Tarek Sayed for his guidance, support, patience and optimism. I thank him for inspiring me with his passion and devotion to the field of transportation safety. I am glad we found a balance and we were able to work together in a project we both enjoy. Secondly, I would like to thank my co-supervisor Dr. Mohamed Wahba for taking the time to review and give me invaluable comments about my many calculations attempts and, for reading this document so carefully so every time it would become a better version . Even while he was unfortunately away from campus, he made himself available and had the patience to discuss with me this research a million times over the phone. In addition, I want to thank my colleagues Shweekar El-Bassiouni and Mohamed Elesawey, they were very helpful when I was starting my research. Despite the fact that they were busy with their own research projects, they managed to make some time to help me learn statistical software (SAS), mapping software (ArcGIS), answer my questions about collision prediction models and even lent me some useful books. My sincere thanks to TransLink, the Insurance Corporation of British Columbia and Census Canada, for letting me use their database, which made this research possible. Moreover, I want to thank my parents for their unconditional love and support, not only during this journey towards my masters degree, but through all of my life. My parents have always been a role model of hard work and perseverance, always been there for me with every project I attempted . I would not be here if it wasn’t for all their efforts and sacrifices. Special gratitude to my little sister Laura Quintero, firstly for taking the time to proof read this very long document. But mostly, for been my companion since we started the adventure of moving together to Vancouver to pursue our respective degrees at UBC. Thanks to Laura I have learned so much, from small lessons such as how to remove an oil based paint stain out of a carpet, to important lessons; such as knowing that reaching any goal is possible, as longs as you put the necessary effort. Lastly, but not least, I thank Carlos Uribe, for having endless patience to teach me LYX (a friendly version of LATEX) so this document would look impeccable and beautiful. One interesting thing about Carlos, is that he was the first person from a non-engineering field who genuinely wanted to know what my research was about. I thank him for x Acknowledgments always encouraging and supporting me to finish this project, which at many points I thought was impossible to complete. I dedicate this work to my family (my parents, sister and Carlos), they mean everything to me. xi Chapter 1 Introduction 1.1 Background In 2008 more than 50% of the world’s population lived in cities. Nowadays, fast urbanization is taking place as the world’s projected population growth between the years of 1950 to 2050 has been estimated to be of 1.29 % per year, while the urban growth has been predicted to be of 2.35% per year for the same time period (United Nations [21]). This rapid urbanization, in combination with the rising vehicle costs and the increased awareness regarding environmental issues will likely generate increased transit usage in the future (Wiley et al. [28]). Therefore, the challenge for planners and practitioners is to plan and design innovative, efficient, sustainable and safe systems. One interesting research area under which new solutions for public transportation planning could be found is the study of networks. Network techniques involve the analysis of systems by viewing them as a graph composed by a set of vertices (or nodes) and edges (or links). Once the transport system is visualized as a graph, various network properties can be computed and evaluated, based on the relationships between the network elements (i.e. relationships between vertices and edges). In the case of transportation systems, network properties of interest include connectivity, coverage, directness and complexity, to name a few. Current research efforts applying network analysis to public transportation systems have focused on the characterization and comparison of metro systems around the world; examples include Musso and Vuchic [20], Gattuso and Mirello [8] and Derrible and Kennedy [5]. The literature is lacking the assessment of network properties for other types of transit systems (e.g. surface service). Moreover, network properties are always estimated for the network as a whole. In this unique exercise, a new analysis is proposed in which a bus system is characterized and analyzed at the zone level. By conducting the analysis at the zonal level, several advances to the calculation of network properties are introduced and new findings are concluded. In traditional approaches for computing network properties, only physical network characteristics are used (i.e. vertices and edges) while the influence of operational attributes of transit services (i.e. frequency, capacity versus demand ratio, 1 1.2. Objectives service time span) is not accounted for. This study will focus on developing transit indicators able to explain the association of frequency of bus routes with connectivity and complexity (indicators related to the physical characteristics of transit systems). The study will also investigate the influence of the newly developed indicators over ridership by developing macro-level ridership prediction models. On the other hand, the need for tools to enable the evaluation of safety of transit networks at the planning stage is growing. Recent research efforts include the development of techniques and decision tools which facilitate a proactive safety approach for the design of public transport networks and services. Planning-level collision prediction models are an example of such decision tools for practicing proactive safety. Most collision prediction models have been developed for auto collisions. Although some vehicle-to-vehicle collisions could be attributed to the presence of transit vehicles (e.g. stop-and-go behavior of transit vehicles, transit vehicles blocking the view of the road for auto drivers), auto collision prediction models traditionally do not include transit related explanatory variables (Cheung et al. [3]). Collision prediction models accounting for transit elements are scarce in the literature. Only two studies were found related to the development of transit safety models, Jovanis et. al. [12] and Cheung et al. [3]. This study investigates the development of collision prediction models that explicitly incorporate transit elements and transit network properties at the zonal level as explanatory variables. 1.2 Objectives The main goal of this study is to investigate the development of zonal level ridership models and zonal level transit oriented safety models based on transit network properties, as explanatory variables. The motivation for this research is mainly the lack of focus on network characteristics during the planning and design of public transportation systems. The available network analysis tools currently fail to include transit operational attributes (i.e. frequency, capacity, capacity versus demand, among others) and give importance mostly to the relationship between vertices and edges. Also, the scarcity of safety transit- oriented estimation tools is a concern to both researchers and practitioners. Two objectives were identified for this research as a way to contribute to the two identified research problems: 1. Adapt and improve network design tools for their application in bus systems. This goal can be achieved as follows: (a) Re-define (or modify) the current transit indicators used to characterize metro systems, for the case of bus systems. 2 1.3. Thesis Structure (b) Propose new ways of measuring transit network characteristics by accounting also for operational factors. (c) Include transit network properties as independent variables in the development of ridership models. 2. Develop empirical tools useful for transit-oriented safety planning. This goal will be reached by incorporating transit physical and operational elements present in the roadways and transit network indicators as explanatory variables in the development of macro level collision prediction models. The ridership models and collision prediction models will be developed for the Greater Vancouver Regional District (GVRD) public transportation system and its 577 traffic analysis zones. 1.3 Thesis Structure This document is divided into five chapters. Chapter one is the introduction of the thesis including: a brief background, research problem and goals. Chapter two is the literature review, consisting of a review of the research efforts in the field of graph theory based transit indicators, applications of indicators in metro networks characterization and design. Chapter two also includes a review on the reactive and proactive safety approach, the current practices for the development of macro-level collision prediction models (CPMs) and previous modeling attempts especially in the area of transit safety. Chapter three describes the data and methodology used to develop new graph theory based transit indicators. These techniques are used to develop macro level ridership and CPMs applying transit indicators as explanatory variables. Chapter four presents the results and the findings obtained by applying the proposed methodology to the Greater Vancouver Regional District (GVRD) public transportation network. The spatial distribution of the various transit indicators estimated is discussed. Ridership prediction models and CPMs are explained, including a discussion of their goodness of fit, interpretation of the relationships obtained and comparing them with logical expectations and previous modeling results from the literature. Finally, chapter five contains a summary of the research main conclusions and contributions, and proposals for future research topics. 3 Chapter 2 Literature Review 2.1 Characterization of Transit Networks Using Graph Theory Based Indicators 2.1.1 Introduction to Graph Theory Graph theory has a wide range of applications such as in chemistry, electrical engineering and computer science among others; but it was actually developed from an urban transportation problem (Derrible and Kennedy,[5]). The basic concepts of graph theory originated in the 18th century with the solution of the “The Seven Bridges of Konigsberg” problem performed by the famous mathematician Leonard Euler. Konigsberg (now known as Kaliningrad) is a city located in Eastern Prussia, which has the Pregel River going across town dividing the land into four parts (Figure 2.1(a)). In order to move around the city there were seven bridges crossing the river. It was said that people used to entertain themselves trying to find a route that would cross all seven bridges just once. Euler used the first graph representation to illustrate the problem by drawing the bridges as links and the four parts of land as nodes as shown in Figure 2.1(b). Based on Euler’s graph representation of the problem and the contribution of many other mathematicians afterward; the basic principles of graph theory were stated. Some of the fundamental graph theory concepts relevant to this particular research are explained below. A graph G(V, E) consists of : • A set of vertices V , which is composed by vertices vi . • A set of links or edges E. Where an individual edge ei j is a continuous line between vertex vi and v j • An incidence function that associates the edges in E with pairs of vertices in V. There are many types of graphs. For example, graphs can have a finite or infinite number of edges E and vertices V . There are two types of finite graphs: directed and undirected. Directed graphs have an arrow which represents the direction in 4 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators (a) (b) Graph representation Figure 2.1: “The Seven Bridges of Konigsberg Problem” which movement takes place (see Figure 2.2(a)). In undirected graphs movement or flow happens in both directions (See Figure 2.2(b)). One sub-type of undirected graph is trees, their main feature is they have zero circuits (a circuit can be defined as any closed path or loop in the graph). In addition, tree graphs are connected and cycles can be formed if one extra edge is added. Moreover, a tree with v vertices will always have v − 1 edges (See Figure 2.1.12.2(c)). Graphs can also be classified as planar or non planar. Planar graphs can only be drawn in plane surfaces and edges only intersect at the vertices of the graph (See Figure 2.2(d) ). While non planar graphs can be represented in a 2-D or higher dimensional space and edges can cross at places other than nodes (See Figure 2.2(e)). 5 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators (a) Directed graph (b) Undirected graph (c) Tree graph (d) Planar graph (e) Non-planar graph Figure 2.2: Types of graphs 2.1.2 Early Graph Theory (1960’s): The First Connectivity Indicators One important characteristic of graphs is its connectivity. The first indicator of connectivity was called the cyclomatic number µ (or 1st Betti number) and was proposed by the French mathematician Claude Berge in 1962. The cyclomatic number µ represents the number of circuits present in a graph .Alternatively, µ can also be understood as a measure of the amount of alternative paths between pairs of vertices. It is defined mathematically as: µ = e − (v − p) = e − v + p (2.1) Where e is the number of edges (links), v the number of vertices (nodes) and p the number of non- connected graphs (isolated networks). A way of understanding the cyclomatic number formula is to consider a tree with v vertices (trees are explained in detail in Section 2.1.1) , which will have v − 1 edges and no circuits µ = 0. If one extra edge is added to the tree, a graph with one circuit (µ = 1) will be obtained. If more edges are added to a graph with v vertices, the graph will have e − (v + p) circuits. As it can be seen these “extra edges” are the ones that allow the formation of circuits in the graph . The minimum value of the cyclomatic number is zero for the case of trees, forests, null graphs and disconnected graphs (p > 1). As graphs get closer and closer to a completely connected state the cyclomatic number will increase, because these graphs will offer more alternative paths 6 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators or circuits between pairs of vertices. The maximum cyclomatic number µmax is the subtraction of the maximum number of edges emax and the number of edges on a tree v − 1: µ max = emax − (v − 1) (2.2) For non-planar graphs emax = v(v − 1)/2 and for planar graphs emax = 3(v − 2), obtaining: µmax = v(v−1) 2 − (v − 1) 3(v − 2) − (v − 1) = 2v − 5 Non planar Planar (2.3) As explained before, the cyclomatic number µ ranges between zero and an upper bound µmax , which is a function of the number of vertices (See Equation 2.3). Then, different graphs will have cyclomatic numbers µ with different upper bounds, hence not allowing for comparisons or ranking of those graphs. In 1965, Garrison and Marble tried to apply graph theory principles to transportation networks and pointed out the necessity of connectivity indices with common bounds to allow for comparisons between networks. They developed three indices for connectivity: alpha α,gamma γand beta β , where α and γ have common bounds. . The first indicator alpha α represents the ratio of the actual number of circuits (the cyclomatic number µ) to the maximum number of circuits in the graph µmax . Alpha α is also called the “degree of cyclicity” and is defined as: α= α= µ µmax = e−v+ p emax − (v − 1) e−v+p v(v−1)/2−(v−1) e−v+p e−v+p 3(v−2)−(v−1) = 2(v−5) Non planar Planar (2.4) (2.5) Completely connected networks (having the maximum number of edges emax ) will have an α = 1, while networks with decreasing number of edges will have their alpha α indicator approaching zero and trees with no circuits will have an alpha α equal to zero. The second indicator proposed by Garrison and Marble is called gamma γ and is typically referred as the “connectivity index”. Gamma γ represents the ratio between the actual number of edges e over the maximum number of edges emax in a graph: γ= e emax (2.6) As mentioned before, for non-planar graphs emax = v(v − 1)/2 and for planar graphs emax = 3(v − 2). Therefore: 7 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators γ= e v(v−1)/2 e 3(v−2) Non planar Planar (2.7) Similar to alpha α, the gamma-index γ values are bounded between 0 and 1. A completely connected network will have a gamma γ equal to one and a completely disconnected network will have a gamma γ of zero. The last index developed by Garrison and Marble is the beta-index β , which is often called complexity in the literature. The beta- index β is the average number of links per vertex and is written mathematically as: e (2.8) v Networks with a complex structure will have high beta values and conversely simple structures will have low beta values (Kansky,[13]). The beta-index β range is 0 < β < ∞ for non planar graphs and 0 < β < 3 for planar graphs. The minimum value of the beta-index βmin occurs when graphs have no edges. The maximum possible values for beta βmax for the non planar and planar case are calculated below in more detail: β= βmax= emax v emax v = = v(v−1)⁄2 = v−1 v v 3(v−2) = 3 − 6v v ≈3 Non planar Planar (2.9) In addition, trees and disconnected graphs will always have β < 1. It is interesting to note that the concept of the beta-index β is very similar to a concept from graph theory called the degree (or valency) of a vertex deg(v). The degree of a vertex deg(v) is defined as the number of edges incident to a vertex. Actually, twice the value of the beta-index β will equal the average degree of the vertices in a graph (Black [1]). Finally, Figure 2.3 shows the cyclomatic number µ, alpha α, gamma γ and beta β for graphs in various stages of connectivity. Garrison and Marble also studied the relationship of the three connectivity indicators with each other. They concluded that the indicators are highly correlated, therefore there is no need to use more than one of them in the same study (Black [1]). Figure 2.4 shows a graph for calculating the value of γ (or α) based on the values of α (or γ), the number of vertices v and edges e. As illustrated, the value of the gamma γ and the alpha α indices become almost identical for a high number of vertices. 8 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.3: Example of calculation of connectivity indicators for various graphs. Figure 2.4: Nomograph between connectivity indicators α and γ. (Black [1]) 9 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators (a) Zone A (b) Zone B (c) Zone C Figure 2.5: Example of calculations of the Average Edge Length η. (Kansky, [13]) 2.1.3 Early Graph Theory (1960’s): Kansky During the 1960’s Kansky worked on transportation network indicators and tried to relate them with economic development (Derrible and Kennedy [5]). Kansky developed four indicators called eta η, pi π, theta θ and iota ι. The first indicator, etaη, is known as the average edge length. The equation for eta η is as follows: M (2.10) e Where M is the total mileage of the network and e is the number of edges. Changes in the definition of a vertex, which in consequence will change the number of edges, can alter the value of eta η. In Figure 2.5, a network of mileage M is studied and vertices are defined as urban centers. As it can be seen by comparing between network A, B and C, the more urban centers are considered as transportation vertices then the lower the eta-index η. The second indicator is called the Pi -index π. The Pi number expresses the relationship between the circumference C of a circle and its diameter d ( π = C⁄d). In the case of a transportation network the circumference C will be analogous to the total mileage of the system M. The diameter d will be equivalent to the diameter in miles of the system. Typically the values of the Pi-index π for transportation networks are greater or equal to one. The Pi indicator π basically represents the system spread. A good network will have an adequate mileage M but a small diameter in order to have reduced travel times and short distances between furthest pair of points. The third indicator is called the theta-index θ and is a ratio of the traffic network to its vertices: η= T (2.11) v Where T is the total traffic flow and v is the number of vertices of the network. Alternatively, T can be expressed as the total volume of freight carried. Another approach for calculating theta θ is is based on the total mileage of the system M : θ= 10 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators (a) Zone A (b) Zone B Figure 2.6: Example of calculations of the θ -index. (Kansky [13]) M (2.12) v In this case theta θ will represent the length per vertex. The theta index θ has an interesting property as it not only gives information about the length of the network, but also about the structure and connectivity of the network. As seen in figure2.6 two networks with the same eta index η but different structure have different values for their theta indexθ . The last indicator is called iota ι and is expressed mathematically as: θ= M (2.13) w Where M is the total mileage of the network and w is the number of vertices weighted by a function (for example larger roads can get a higher weights). By weighting the number of vertices the iota indicator seems more meaningful if compared to the theta index θ . However, the validity of iota ι depends on the justification for the scheme to be used for weighting the vertices (Kansky [13]). Typically, the iota indicator ι as defined above is used when there is no information of the traffic flow available, and assuming that the network’s structure reflects the traffic patterns (Kansky [13]). On the other hand, if traffic flow information is available iota ι can be estimated as the ratio between total mileage M and the traffic flow T : ι= M (2.14) T The second definition of iota represents the density of the traffic flow in the network or the average distance per vehicle or ton of freight. ι= 11 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators 2.1.4 Traditional Applications of Graph Theory to Transit Network Design In the 1980’s Musso and Vuchic contributed significantly to metro network measures and indicators. They proposed an extensive list of indicators and classified them in four categories: 1) Network size and form 2) Network topology 3) Relationship to the city and 4) Service measures and utilization. The network size and form indicators included:ni : Number of stations (vertices) on a line i. • ai : Number of inter-station spacings (edges) on line i. • li :Length of line i, • Number of multiple stationsnkm (used by more than one line), number of spacings akm and their lengths lmk . Where k denotes the number of lines using a station or spacing. • nl : Number of lines in the network, . • Number of stations in the network, N: is the sum of stations on individual lines minus the sum of multiple stations as: n kmax N = ∑ ni − ∑ (1 − k)nkm i=1 (2.15) k=2 • Number of station spacings in a network A: is the sum of spacings in individual lines minus the multiple spacings. n kmax A = ∑ ai − ∑ (1 − k)akm i=1 (2.16) k=2 • Route Length of network L: is the sum of the line lengths minus the multiple line lengths n kmax L = ∑ li − ∑ (1 − k)lmk i=1 (2.17) k=2 • Number of circles (i.e. cyclomatic number µ) • Number of station-to-station travel paths OD: consists of direct paths ODd and paths composed by one or two transfers ODt . In a network with N stations the number of all possible OD paths is: OD = N(N − 1) 2 (2.18) 12 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators For a line composed of i stations the number of all possible OD paths will be: ni (ni − 1) ODi = (2.19) 2 The total number of direct paths ODd in a networks equals the sum of paths along the lines plus the number of paths on multiple lines (section serving two or more lines). ODd = ni 1 ni n (n − 1) + ∑ i i ∑ nm j nb j 2 i=1 i=1 (2.20) Where nm is the number of multiple stations and nb is the number of single stations. Finally the number of transfer paths is obtained from the difference between the total paths and the the direct paths as follows: ODt = OD − ODd (2.21) The network topology indicators studied were: • Average inter-station spacing S: is the ratio of total route length L and the number of interstation spacings A. Mathematically is defined as: S= L max k N − nl + Σkk=2 nm = L A (2.22) • Spacing between stations S is typically a trade off between good coverage (short spacing between stops) and high speed (long spacings between stops) (Vuchic and Musso [20]). • Line overlapping λ : is defined as the ratio of the sum of line lengths to the total network length. Σnl li Σkmax l k λ = i=1 = k=2 m (2.23) L L Networks with single lines will have a λ = 1, while systems with more interconnected lines will have a higher overlapping (Vuchic and Musso [20]). • Directness of service: is an indicator that tells which percentage of the OD paths are made without transfers. The formula for δ is: δ= ODd ODd = ODd + ODt OD (2.24) The values of directness of service δ range between 0 and 1, δ = 1 happens when the network is composed only by single lines. More interconnected networks have lower values of δ . 13 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators The indicators previously proposed by Garrison and Marble (explained in Section 2.1.2) were also studied by Musso and Vuchic [20] as part of the network and topology indicators category : • Circle availability (i.e. the α-index) • Network complexity (i.e. the β -index) • Network connectivity (i.e. the γ -index) The indicators in the relationship to the city category indicators contrasts the size and number of stations of metro networks versus the characteristics of the city, such as population and size. Additionally, the indicators also evaluate the role of the metro compared to other transportation modes available in the city. (Musso and Vuchic [20]). The most important indicators in this group include: • Density of the metro network La : is the ratio of network length L to the area of the city served Su . La represents the extension of the network with respect to the area it serves (Vuchic & Musso [20]). The indicator is expressed as: La = L Su (2.25) • Network extensiveness per population L p : is the ratio of network length L to the population living in the served area Pu . L p is expressed mathematically as: Lp = L Pu (2.26) If comparing cities with the same population a higher L p means a more extensive metro system (Vuchic & Musso [20]). • Area coverage Na : is the proportion of urban area Su that is located within a walking distance to metro stations. Na = nSi Su (2.27) Where, Si is the circular area around the station with a radius of 400m. 14 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators • Street transit integration ratio nt : is the ratio of transit lines that have transfers to the metro nts to all transit lines ns .This indicator explains the role of the metro within the whole public transportation system (Vuchic and Musso [20]). The indicator is expressed as: nt = nts ns (2.28) • Auto access integration na : is the ratio of stations with park and ride N p to the number of network stations N. Is expressed as: na= Np (2.29) N The indicators in the services and utilization category include principally level of service (speed and frequency) and system performance (design and scheduled capacities, plus performed work). On the other hand utilization indicators affect the economic efficiency of operations. Furthermore, these indicators are strongly related to the design of metro lines and the topology of metro networks. Other type of utilization indicators measures the intensity of usage of metro systems (Musso and Vuchic [20]). The indicators are listed below. Services indicators: • Operating speed weighted by veh-km per time (day) VCV : it is expressed mathematically as Σnl wiVoi VCV = i=1 (2.30) Σwi Where wi are the weighting values and Voi is the operating speed in veh-km per time. • Frequency of service during peaks weighted by stations fw : the equation is fw = l Σni=1 fi ni . nl Σi=1 ni (2.31) Where fi is the frequency of service of routei during peaks and ni is the weighting by stations. • Highest design line capacity in network C: represents the maximum capability of a transit line i to move passengers and has units of passengers per hour. The formula is as follows: C = fimax nTU Cv (2.32) 15 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Where fimax is the maximum frequency in line i, which is the shortest headway hmin at all sections (with or without stations) along the transit line. Mathematically fimax is expressed as: 3600 (2.33) hmin ηTU is the number of vehicles per number of transit units, in the case of single vehicle operation (i.e. only buses) nTU = 1. Cv represents the vehicle capacity, expressed in spaces per vehicle. Vehicle capacity Cv can also be defined as the summation of the number of seats m and the number of standing spaces m in the transit vehicle. fimax = • Maximum scheduled line capacity CS : expressed as CS = max( fi nTU Cv ) (2.34) • Line capacity utilization coefficient ηc : expressed as Cs C η= (2.35) • Riding habit (annual trips per capita) R: expressed as Pay P where Pay are the annual passengers and P is the population. R= (2.36) • Passengers per year per network length RL : expressed as Pay L RL = (2.37) where L is the network length. • Passengers km per day • Passenger-km per day over space km per day a: expressed as a= Pad Sd (2.38) Where Sd is the space km per day. • Metro daily passengers as of transit daily passengers PM : expressed as PM = Pad Pt (2.39) Where Pt is the total transit passengers. 16 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators 2.1.5 Recent Applications of Graph Theory to Transit Characterization & Comparison: Gattuso and Mirello In 2005, Gattusso and Mirello recalled indicators from previous authors (i.e. Kansky [13], Vuchic and Musso [20], Garrison and Marble [7]) and classified them into three categories: 1) topological, 2) geographical and 3) performance indicators (see figure 2.7). They calculated indicators for 13 metro networks (located in cities in Europe and New York). But their principal contribution was their use of a weighting scheme (using the number of transfers as the criteria) for the stations called relative node weight pri , as well as their proposal of two new indicators called node range of influence Ri and network covering . The new indicators are very interesting as they describe the properties of metro networks relative to their geographical location in the city (Derrible and Kennedy [5]). The indicators are explained in more detail below: • Node’s range of influence Ri : estimates the attractiveness or width of range of influence Ri of a vertex considering three aspects: 1. The first aspect is the geographical position, which defines how far the node is in respect to the city center. The city is divided into three areas formed by three concentric circles with increasing radius; the first area is the “center”, the second area is called the “first corona” and the third area is the “second corona”. 2. The second aspect is the relative node weight pri , which is the weight of vertex i relative to all vertices in the network, it is calculated based on the amount of connections at the vertex (Derrible and Kennedy[5]). 3. The third aspect is the directness and is expressed as the ratio of destinations connected directly by station i to the average number of direct connections in the network (nDi /ND ). The node range of influence Ri can be expressed mathematically as: Ri = Rb ai ( a2 nDi + a3 pri ) Nd (2.40) Where Rb is the maximum walking distance to a metro station, typically 500 m. ai is based on the geographical position of the node. If node i is located in the center then ai =0.5,in the “first corona” ai =1 17 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.7: Categories of transit indicators. (Vuchic and Musso [20]) and in the “second corona” ai =1.5 . pri is the relative node weight, a2 and a3 are coefficients that weight the contribution of the directness and the relative node weight, a2 =0.65 and a3 =0.35. Basically the node range of influence Ri of a metro station is subject to its geographic position (nodes in the suburbs present higher distances of access) and importance (based on the number of links and direct connections to the station). • Network covering: is the coverage of the network using the node range of influence Ri as the radius for each node i. Areas of coverage which end up overlapping are subtracted in order to account only for the effective area (see Figure 2.8). 18 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.8: Example of network coverage calculations with overlapping. (Gattuso and Mirello [8]) 2.1.6 Recent Applications of Graph Theory to Transit Characterization & Comparison: Derrible and Kennedy [5] The main contribution made by Derrible and Kennedy was the characterization of 33 metro networks based on indicators from previous authors and the creation of two new indicators. In addition, Derrible and Kennedy [5] developed a relationship between transit indicators and ridership. Characterization of metro networks using transit indicators In order to characterize the 33 metro networks, Derrible and Kennedy recalled a methodology proposed by Berge in the 1960’s to transform transportation networks into graphs. The first assumption in the methodology is that metro networks can be represented as undirected graphs with V vertices and E edges. The undirected assumption is based on the fact that most metro networks have two-way lines. Then, the methodology (explained below) defines vertices (nodes) and edges (links), giving a criteria to decide which stations can become vertices and how rail lines can be counted when transforming the network into a graph. • Stations to vertices: in the case of the stations there are two possible approaches, consider all stations NS as vertices or consider only transfer and end stations as vertices. Derrible and Kennedy used the second approach for the vertices selection, where transfer vertices are defined as stations where it is possible to change lines (either by changing platforms or by walking) without leaving the system. End vertices 19 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.9: Venn diagram differentiating metro network stations from graph vertices. (Derrible and Kennedy [5]). are the last stations of a rail line and where there is no possibility to transfer to another line. The remaining stations are where there are no transfers available or are not terminal are called intermediate stations and are not considered as vertices of the graph. Mathematically the number of vertices V in a graph can be expressed as: V = V T +V E (2.41) Figure 2.9 depicts the difference between the actual number of stations in the metro network NS and the number of vertices V (transfer V T and end V E ) in a graph representation. Figure 2.10 shows a simple example of a metro network and how its stations are transformed into vertices. Note that stations 1 and 2 are considered as transfer vertices since they are stations where the option of switching from Line 1 to 2 (or vice versa) is available. Station 4 is considered as a termini vertex as it is the last and initial station for Line 2 and no transfers to other lines are available. Station 3 is not considered a vertex since it is just an intermediate station. • Lines to edges: rail lines can be represented as edges E in a graph, edges can be either single E S or multiple E M . Mathematically the number of edges E in a graph can be expressed as: E = ES + EM (2.42) Figure 2.11 shows another example of a simple metro network. In this case vertices 2 and 3 are linked by two rail lines , therefore there is one 20 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators (a) Metro Network (b) Graph Figure 2.10: Example of transformation from metro network to a graph Figure 2.11: Example of multiple and single edges single edge and one multiple edge.Very differently there are three rail lines between vertices 1 and 2, thus one single edge and two multiple edges are used to represent this scenario . In the case of rail lines linking the same vertices, when such lines have a different number of intermediate stations in between them, the rail lines are counted as single edges. Based on the methodology explained above, Derrible and Kennedy transformed 33 metro networks into graphs and then characterized them based on various indicators. Some of such indicators were the ones proposed by previous authors including: • Degree of connectivity γ (already explained in Section 2.1.2) 21 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators • Complexity β (already explained in Section 2.1.2) • Network length L: is the sum of the lengths of all rail lines in the network • Number of lines nL is the number of rail lines available in the network • Average line length: can be expressed mathematically as A = L nL • Number of stations NS is the total number of stations including trans- fer, end and intermediate stations. • Inter-station spacing: is the average length of space between stations and is expressed as S = L NS • Coverage σ : represents the proportion of land area served by the tran- sit network. It is the sum of the circular areas (of an arbitrarily selected radius r)surrounding the metro stations over the total urban area served AServed (see Figure 2.12): σ= r2 πNS AServed (2.43) Where r is the radius of the circular area surrounding each station. Derrible and Kennedy used a standard value of r = 500 m, but depending on the situation the radius r can range between 400 m to 1 km. However, the value used for the radius r is irrelevant as it is a constant in the equation and also because the indicator values will be used just for comparison between networks. The area served AServed is calculated as the ratio between population and population density. Using the average of the city and urban areas as the values for both, population and population density. The area served is expressed as follows: (PCity + PUrban )/2 AServed = (2.44) (PDCity + PDUrban )/2 were P is population and PD is population density. Figure 2.13 shows graphically the difference between the city, urban and served areas within a metro network. The two new indicators proposed by Derrible and Kennedy [5] are described below: 22 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.12: Calculation of coverage area for metro networks Figure 2.13: Typical areas: City, Served and Urban within a metro network (Derrible and Kennedy, [5]) • Directness τ: an indicator for directness must be decreasing with the maximum number of transfers δ and related to the number of lines nL . It is defined as: nL (2.45) τ= δ Where the maximum number of transfers δ is the number of transfers required to go from the two furthest vertices of the network through the shortest path. Note that the maximum number of transfers δ is equivalent to a concept from graph theory called network diameter. • Structural connectivityρ: measures the amount of connections avail- able in the system. In order to calculate ρ, another term called the number of transfer possibilities Vct must be defined first. The number of transfer possibilities for a vertex i is the number of lines crossing the vertex minus one. The number of transfer possibilities for the network will be the sum of all transfer possibilities at all nodes. The number of transfer possibilities Vct is defined as: Vct = ∑ni (l − 1)vi,l (2.46) Where l is the number of lines passing by vertex i and vi,l is an element 23 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators of the vertices V in graph G(V, E). The structural connectivity can be expressed as: V t − EM ρ= c T (2.47) V Where E M is the number of multiple edges and V T is the number of transfer vertices. The numerator term represents the net amount of transfer possibilities. The subtraction by the number of multiple edges E M avoids the double counting of transfer possibilities in overlapping lines (Derrible and Kennedy [5]). While dividing the numerator by V T normalizes the indicator, and thus make it independent of the network size. Categories of Transit Indicators Transit indicators can be classified into three categories (Derrible and Kennedy, [5]): • State: Indicators in this category include Garrison & Marble’s com- plexity β and the degree of connectivityγ. The state category represents the stage of development of a network and differentiates between simple and complex systems. According to Derrible and Kennedy metro networks go through three phases of development (Figure 2.14 &2.15). Phase I occurs when the network is created and starts developing (β =1.3, γ=0.5). At this point the relationship between β and γ is not very strong. During Phase II some vertices are added to the network and therefore new edges are formed as well (β =1.6, γ=0.6). In addition, Phase II metro networks present a stronger relationship between β and γ , as they are more complex and connected. Finally in the third phase, once the network has expanded sufficiently , the complexity β remains constant at a value of approximately two and the degree of connectivity reaches a maximum value of 66% (β =1.96, γ=0.66). • Structure: this category is composed by both the directness τ and the structural connectivity ρ indicators. The structure category describes whether the networks are focused on providing directness, connectivity or integrated (focus on both connectivity and directness) as shown in Figure 2.16. As it can be seen, the most connected network among 24 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.14: Graph of Complexity β versus Degree of Connectivityγ . (Derrible and Kennedy[5]) the 33 studied networks is the Buenos Aires metro, nevertheless it presents the lowest directness. In contrast, the least connected network is located in Brussels, while the most direct networks are in London and Tokyo, as they have a high number of rail lines but few transfers. A good practice for metro design should be to aim for an integrated system, meaning that it will provide strong connectivity ρ and directness τ. From figure 2.16 it can be seen that there is no trade off between the two indicators and that an integrated oriented network can be achieved regardless of the size of the network (Derrible and Kennedy [5]). • Form: this category is formed by indicators that help identify between regionally or locally oriented networks. Such indicators include: network length L, number of lines nL , average line length A, number of stations NS and inter-station spacing S. As shown in Figure 2.17, according to the average line length A and the number of stations NS , metro networks can be classified in three zones. The first zone is called “Regional Accessibility” where the average line lengths L have high values and the number of stations NS is small. This means that the systems in the “Regional Accessibility” zone are focused on moving people from the suburbs to the city core. The second zone is the “Lo25 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators (a) Phase I (b) Phase II (c) Phase III Figure 2.15: Examples of metro networks according to state 26 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.16: Graph of Directness versus Structural Connectivityρ . (Derrible and Kennedy[5]) cal Coverage” in which rail lines are short and there is a high number of stations. Therefore, the network specializes in servicing passengers for their trips inside the downtown area. Finally, the area named “Regional Coverage” is a mixture between the other two previously mentioned zones. It gives a good service for people in the suburbs traveling to the city core, as well as for internal trips in the city core. Ridership and transit indicators Derrible and Kennedy developed a relationship between transit indicators and ridership based on 19 metro networks located around the world. In this study he found that ridership Bpc (expressed in boardings per capita) was strongly related to three transit indicators: coverage σ , structural connectivity ρ and directness τ. The result was the multiple linear relationship presented below: Bpc = 44.963lnσ + 7.579τ + 92.316ρ + 102.947 (2.48) The relationship shows that the three indicators have almost the same weight or influence on ridership. Therefore, designs should focus on achieving a maximum coverage and connectivity,while maintaining the directness 27 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.17: Average line length A versus Number of Stations NS . (Derrible and Kennedy[5]) of the trips. This relationship is a useful tool in order to compare diverse metro network designs and choose the ones which maximize ridership. In addition, it also proves that ridership is not only dependent upon cultural characteristics or city design, it also depends on network characteristics (Derrible and Kennedy [5]). 2.1.7 Other Non- Graph Theory Based Transit Network Indicators : Local Index of Transit Availability The Local Index of Transit Availability (LITA) is a powerful indicator used to evaluate the intensity of transit in each traffic analysis zone (TAZ). The LITA was proposed by Rood in 1998 and was first implemented for Riverside County, California. It has also been used to measure transit performance in Bradford, England (Pennycook et al. [22]), Vancouver, Canada (Hariwal et al. [18] ) and the Greater Toronto and Hamilton area (Wiley et al [28]). The LITA is evaluated as the computation of three separate scores: frequency of service, capacity and coverage. Each score calculation is explained in more detail below. • Frequency of service fi : is the ratio of the total number of transit 28 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators vehicles vi entering zone i and the area of developed land ai in zone i. vi fi = (2.49) ai • Capacity ci : is the ratio of the daily available transit seats of all routes traveling to the total number of riders (residents and employees) using the system. (vi ∗ si ) ∗ (ti + 0.5ri ) ci = (2.50) Pi + Ei Where si is the number of seats of the daily buses entering zone i, Pi is the population of zone i, Ei is the employment in zone i, ti is the length of two-way routes operating inside zone i and ri is the length of routes at the border of zone i. Note that the length of routes in the border of two zones is multiplied by 0.5 for each zone. • Coverage gi : is measured as the density of the transit stops within the zone and is expressed as shown in Equation 2.51. gi = oi + 0.5qi ai (2.51) Where oi is the number of transit stops inside zone i and qi is the number of transit stops located at the border of zone i.Note that stops located in the border of two zones are multiplied by a factor of 0.5 for each zone. Once all three scores are calculated for each zone, the scores are standardized to produce a z-score. Finally, to obtain the overall LITA score, the three z-scores are averaged for each zone. To avoid obtaining negative zscores previous studies have added an arbitrary value of 5 to each score. Then a grade between “A” and “F” was assigned depending on the values. However such grading system resulted in zones not being evaluated adequately. Zones with no transit at all will end up getting a grade of “D”, while zones with either low or high transit availability will obtain a grade “C”. Due to these issues, Wiley [28] proposed to change the grading system by classifying the standardized z-scores into their five percentiles (20% on each). Then, level 1 will include the z-scores that are in the 20% lowest, level 2 will include values between the 20 and 40% lowest and so on with levels 3, 4 and 5. The description of each level in terms of the transit service provided is as follows: 29 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators • Level 1: No service or extremely limited availability • Level 2: Sparce to less than average levels of availability • Level 3: Average levels of availability • Level 4: Average to good levels of availability • Level 5: Excellent levels of availability, best in region Application of LITA in the Greater Vancouver Regional District GVRD As mentioned before, the LITA indicator was computed for the GVRD transit system in 2009 by Hariwal et al [18]. The analysis was done based on the sociodemographic data and zoning tracks from the Census 2010. Also, using the bus operational information (timetables, frequencies, etc.) obtained from TransLink official website. The resulting LITA scores obtained by Hariwal et al [18] sorted by capacity, frequency and service (also called coverage) are presented in Figure 2.18. Firstly, the service score is highest when leaving the CBD. The service score is evenly defined by regions, the highest service scores occur at the east side of the CBD, as well as the west and the south east parts of the region. The central part of the region and the CBD have low to medium service scores. On the other hand the frequency and capacity maps are very similar, as they show low and medium scores along most of the region.In contrast the highest frequency and capacity scores occur along the rail lines (the Skytrain lines specifically) and in the west side of the region. Finally, the overall L.I.T.A score (see Figure 2.19) presents its highest values inside the CBD, as well as in the east and south areas right next to the CBD and in the west part of the region. Then, the L.IT.A score starts decreasing gradually while traveling from the downtown core to the suburban areas. 30 2.1. Characterization of Transit Networks Using Graph Theory Based Indicators Figure 2.18: L.I.T.A. scores for service, frequency and capacity in the GVRD. (Hariwal et al. [18]) 31 2.2. Collision Prediction Models Figure 2.19: Overall L.I.T.A score for the GVRD. (Hariwal et al. [18]) 2.2 Collision Prediction Models Collision prediction models (CPMs) attempt to explain the relationship between accident occurrence in road facilities and the road characteristics (traffic, geometry, etc.). CPMs are a very valuable tool that has diverse applications such as: estimation of safety potential of a facility, detection and ranking of accident-prone locations, evaluation of effectiveness of safety improvement measures and safety planning (Sayed and Sawalha [26]). Collisions are discrete, non- negative and rare events; therefore it has been a challenge to produce appropriate statistical models to explain them. At the beginning, models were developed using a linear relationship, however further research in the subject showed the basic assumptions under which a linear regression is suitable (normal error structure, constant error variance, the actual existence of a linear relationship) does not apply for the case of collision occurrence data sets ( Miaou and Lum [19]). CPMs statistical modeling is usually performed using a Generalized Linear Model (GLM), which allows for non-normal error structure distribution (typically Poisson or Negative Binomial). The GLM approach has 32 2.2. Collision Prediction Models seemed to overcome the limitations of the linear regression models and produced better fitted models than before. A Poisson or Negative Binomial error distribution assumption has become the standard for CPMs , in fact the literature has plenty of models produced this way (Hauer et al [10]; Miaou and Lum [19]; Sayed and Rodriguez [24]; Sawalha and Sayed [26]). The procedures to develop appropriate CPMs using the GLM approach are explained in the following sections including: model form, selection of error structure, selection of explanatory model variables, outlier analysis and evaluation of goodness of fit . 2.2.1 Model form: The model used for any CPM must satisfy two conditions: 1. The model must produce logical outputs; therefore it cannot predict negative collisions. Also at a zero exposure condition collisions must equal zero, meaning for example that if there are no vehicles in the road (vehicles being an example of exposure) there cannot be any collisions happening. The zero exposure and non negative collision requirements show one of the major limitations of using a linear regression; since the model could lead to illogical results (Lovegrove [17]). 2. In order to use a GLM form, there must be a link function able to transform the model form into a linear form, the latter is needed in order to perform the parameter estimation. Based on empirical studies (i.e. Hauer et. Al [10]; Miaou [19]; Kulmala [14]; Sawalha and Sayed [26]) the proper model form suggested is composed by an exposure measure (i.e. traffic volume, vehicle kilometers traveled, among others) raised to some power and multiplied by an exponential function including?? other non-exposure explanatory variables. The model can be expressed mathematically as: ln(E(Y )) = a0 La1 V a2 exp(Σb j x j ) (2.52) Where E(Y ) is the predicted collision frequency, L is the length of the section of road studied, V is the AADT, x j are any other explanatory 33 2.2. Collision Prediction Models variables and a0 , a1 , a2 and b j are parameters of the model. It can be seen that the model link function in this case can be obtained through the logarithmic function. Sayed and Sawalha [26] demonstrated that the link function can be expressed as follows: ln(E(Y )) = ln(a0 ) + a1 ln(L) + a2 ln(V ) + Σb j x j 2.2.2 (2.53) Error Structure The GLM approach for developing traffic accident models assumes an error structure that is Poisson or Negative Binomial. To determine which error structure is more adequate a Poisson structure is assumed initially and the parameters of the distribution are estimated. Then the dispersion parameter σd is calculated as: σd = Pearson χ 2 n− p (2.54) Where n is the number of observations and p is the number of parameters in the model. The Pearson χ 2 is expressed mathematically as follows: Pearson χ 2 = Σni=1 (yi − E(Y ))2 Var(Yi ) (2.55) Where Yi is the observed number of accidents on section i,E(Y ) is the predicted frequency of accidents by the model in section i and Var(Yi ) is the variance of the frequency of accidents for section i. If the value of the dispersion parameter σd is equal to 1.0 or lower, the Poisson assumption is correct. However if is greater than 1.0 then the data has a greater dispersion than what can be explained by using the Poisson distribution and a Negative Binomial error structure will provide a better fit. 34 2.2. Collision Prediction Models 2.2.3 Selection of Explanatory Variables Factors that intuitively seem to influence collisions are selected as variables. Note that explanatory variables in the same model must be independent (not correlated). Many factors influence collision occurrence, however including them into a CPM as variables is a complicated task. First, factors which intuitively or by experience influence collision occurrence are selected. Then a way of representing them as variables must be selected. In the case of CPMs, the recommended statistical methodology to add explanatory variables into a model is the forward stepwise procedure (Sawalha and Sayed [26]). The procedure is as follows: • Variables are added one by one and their significance is tested. Vari- ables representing exposure must be included first. • To test the significance of each particular variable three tests must be performed: 1. Firstly the t-stat of the variable must be significant at the 95% confidence level (t-stat higher than 1.96). 2. Secondly the sign of the variable must be logical. 3. Thirdly, the addition of a variable should make the value of the 2 scaled deviance SD (See section 2.2.4) decrease exceeding χ0.05,1 = 3.84 at a 95% confidence level. This drop in the scaled deviance SD is required in order to confirm that the new variable really improves the model and is independent from the other variables already in the model (Shawalha and Sayed [26]). • If the variable meets the three criteria explained before, it can stay in the model. Then, the next variable is added to the model and tested for significance. These steps are repeated until there are no more variables left to evaluate. 2.2.4 Evaluation of Goodness of Fit Goodness of fit refers to a measure of how well the model predicts or fits observed data. There are many methods of assessing the goodness of fit for a GLM model both qualitatively and quantitatively. Three quantitative statistics of goodness of fit typically used are: the Pearson χ 2 (described 35 2.2. Collision Prediction Models previously in Section 2.2.2), the scaled deviance SD and the shape parameter κ. The scaled deviance SD represents the likelihood ratio test statistic between twice the difference of the maximized log-likelihoods of the studied model and the complete saturated model . Note the complete model has the maximum log-likelihood achievable for a given set of data, therefore it sets the base for evaluating goodness of fit for all other “less complete” models. If the error structure is Poisson distributed then the scaled deviance is calculated as: SD = 2Σni=1 yi ln( yi ) E(Yi ) (2.56) In the other hand, if the error structure follows a Negative Binomial distribution the scaled deviance SD is expressed as: SD = 2Σni=1 [yi ln( yi yi + κ ) − (yi + κ)ln( )] E(Yi ) E(Yi + κ) (2.57) A model with proper fit will have a Pearson χ 2 and SD statistics lower than the distribution table value for (n − p − 1) degrees of freedom and for a 95% confidence level. In the case of κ there is no minimum value suggested, but a review of previous CPMs showed κ is usually higher than 1.0 (Sawalha [26]). Other subjective measures of goodness of fit are also used. For example, a plot of mean collision frequency predicted by the model E(Λi ) versus observed collisions. A well fitted model will show points around the 45° line. Another measure of fit is a plot of Pearson Residuals PRi versus predicted collisions. Where the Pearson residual is defined as: PRi = E(Λi ) − yi Var(yi ) (2.58) A good fit model will have the values of PRi around zero for all the values of predicted collisions E(Λi ). Other subjective approach include the plotting of the average of squared residuals versus predicted the collision frequency. The average of squared residuals is expressed as follows: 36 2.2. Collision Prediction Models Average SR = Σni=1 [E(Λ) − yi ]2 n (2.59) A well fit model will show a graph where the points are located close to the variance function. In the case of a Negative Binomial error structure, the variance function is: Var(yi ) = E(yi ) + 2.2.5 E(yi )2 κ (2.60) Outlier Analysis While developing CPM’s, models sometimes may not meet one or more of the goodness of fit criteria. Not achieving a well fitted model could be due to the presence of extreme values which are not typical to the rest of the data set. These extreme values are called outliers, they exist because of errors in the data collection or simply because the data points are genuinely atypical. The outlier analysis is a procedure used to remove those extreme values or outliers from the data set. Generally this “refinement of the data set” improves the fitting of models; especially if the SD and/or Pearson statistics are slightly above the value from the distribution table. The methodology for the outlier analysis is based on the Cook’s Distance as described by Sayed and Rodriguez [24] and Sawalha and Sayed [26]. The Cook’s distance is defined as: CDi = hi (rPS )2 p(1 − hi ) i (2.61) Where hi is the leverage value, riPS is the standardized residual of point i and p is the number of parameters in the model. Additionally, riPS is expressed as: riPS = yi − yi = (1 − hi )Var(yi ) PRi (1 − hi ) (2.62) 37 2.2. Collision Prediction Models The Pearson residuals PRi measures how well the model fits the i-th observation and shows how distant is point i from the rest of the data set (Lovegrove [17]). The methodology for removing outliers or outlier analysis is also done following a forward stepwise process: 1. Calculation of CD for all points 2. Removal of the point with the highest CD value 3. Recalculation of the CPM model, but fixing the κ with the value of the previous model. 2 4. If the SD change is at least χ0.05,1 =3.84 the model is re-estimated producing new parameters including a new κ and new CDi ’s for all remaining points . 5. The procedure is repeated until the change in SD becomes less than 2 χ0.05,1 =3.84. Once the last outlier is removed from the data set, all the model parameters are re-estimated one last time, including κ. 2.2.6 The Safety Planning Approach and Macro Collision Prediction Models In many cases, professionals design and build road facilities without considering safety issues at early stages (i.e. the planning stage). Safety improvements are typically reactions to problems happening at already existing road facilities. This reactive approach is however not enough, professionals should also aim to address safety proactively; meaning that potential safety problems must be dealt with even before they emerge (DeLeur and Sayed [4]). There are reliable empirical tools already developed to improve safety from the reactive approach, for example micro-collision prediction models. But tools for the proactive approach are still at an early stage of research and are not considered yet to be very reliable (DeLeur and Sayed [4]). Proactive safety tools available include road safety audits, road safety risk indexes, sustainable road safety programs and CPMs. This section will review CPMs applied to safety planning evaluation. The first studies developing CPMs as planning tools to explain auto collisions were performed by Ho and Guarnashelly [11] and Lord and Persaud [23]. Both studies applied micro-level models, which determined the level of safety for single locations (i.e. an intersection or road segment) based 38 2.2. Collision Prediction Models on an exposure variable. In the two cases the exposure variable was traffic volume, obtained from Emme2 forecasts. However, the two studies found micro-level models to be very sensitive to traffic volume data (traffic volume explained around 50% of the systematic variation in the collision occurrence). Then collision prediction accuracy depended on the quality of the traffic volume forecasts, which generally have an error ranging from 40 to 50% (Lin and Navin [16]). Additionally, micro level models applied to planning forecast are data intensive, assume all other variables impacting safety remain fixed except for the exposure variable (i.e. traffic volume). Since micro-models are performed at the single location level (i.e. intersections or road segments), they are unable to predict safety accurately in the long run, among many other issues . Thus, further studies performed in the 90’s suggested macro-level CPMs could be an alternative tool useful in proactive road safety planning (Lovegrove and Sayed [17]). Macro-level CPMs is an approach in which explanatory variables are first aggregated at the zonal level and then related to collision occurrence. Instead of the micro-level CPMs, were single-facilities, nodes (intersections) or links (road segments) were used as inputs. Three important studies were found to apply macro level collision prediction models for evaluating safety at the planning stage. The first was performed by Hadayeghi et al. in 2003, using the data of 463 traffic zones in Toronto. The models were developed following a Generalized Linear Regression model (GLM) technique. The resulting models related zonal collision to zonal traffic and other zonal attributes, including vehicle kilometers traveled (V KT ), major road lane kilometers, minor road lane kilometers, area, posted speed, average congestion, intersection density, number of households and employed labour force (Lovegrove and Sayed [17]). The models predicted the total or severe collisions, for all day or peak hour scenarios, based on the following mathematical form: E(Λ) = a0V KT a1 eΣbi xi (2.63) Where: E(Λ) =Dependent variable (Mean collision frequency) a0 ,a1 , bi = Model parameters V KT = Zonal total vehicle kilometers traveled from Emme 2 forecast xi = Zonal aggregated explanatory variables The principal conclusions from the models were: 39 2.2. Collision Prediction Models • High collision occurrence was related to zones having high vehicle kilometers traveled, major road kilometers, intersection density and number of households. • Low collision frequency was associated with high average posted speed and high average zonal congestion. A possible reason for the results obtained in areas with high average posted speeds could be the fact that facilities operating at high speed are designed to be safer. In contrast, the high average zonal congestion can decrease collisions since congestion generates low operation speeds. • Increased morning peak collisions were correlated with zones having a high number of employed labour force and minor road kilometers. The second study was developed by Ladron de Guevara et al. in 2004 for 859 traffic zones in Tucson, Arizona. Various CPMs were developed to predict fatal, injury and property damage (PDO) collisions. The models were developed applying a non linear exponential function and assuming a Negative Binomial error distribution. One singularity of this study is the use of population density as the leading exposure variable, instead of the traditional exposure variables recommended (i.e. V KT , AADT ). The reason for using population density was the lack of reliable traffic volume forecasts. Furthermore the users of the models in small communities may not have Emme 2 models available to forecast V KT but will have access to population data. After the models were developed the following conclusions were obtained: • High population density was related to high collision occurrence • Zones having high employment, intersection density, major arterial roads and minor arterial roads were linked with increasing injury and PDO collisions. • High intersection density was associated with decreasing severe col- lisions. This is related to slow average speeds at intersections. The third study was performed by Lovegrove and Sayed in 2005, for 577 zones in the Greater Regional Vancouver District, Canada. The models were developed following the Generalized Linear Regression (GLM) technique and assuming a Negative Binomial error distribution. The models developed described traffic collisions (either total or severe) during the AM 40 2.2. Collision Prediction Models peak scenario and for two types of land use (urban and rural). The model form used was: E(Λ) = a0 Z a1 eΣbi xi (2.64) Where: E(Λ) =Mean collision frequency for a period of three years a0 ,a1 , bi = Model parameters Z= A zonal exposure variable xi = Zonal aggregated explanatory variables The principal exposure variable used was either vehicle kilometer traveled V KT or total lane kilometers T LKM. In this study, exposure was obtained from two sources: measured with a GIS software or modeled from the Emme 2 forecasts. CPMs were developed with measured and modeled exposure variables, in order to compare the results. The idea was to offer practitioners with no access to Emme 2 models with an alternative way to still estimate exposure variables. The developed models were grouped into four themes according to their explanatory variables as follows: • Exposure: consisting of attributes which describe the number of vehi- cles, roads and congestion in each zone. Variables in this set include total transit kilometers V KT , total lane kilometers T LKM, average congestion level, average speed and area. • Sociodemographic: composed by variables describing residents, work- ers and zonal land use. Including average family zone size, home density, zonal residents. population density and residents working, among others. • Network: includes variables describing the road network in each traf- fic zone. Variables in this group are for example number of signals, signal density, number of intersections, intersection density and intersections per lane kilometer, among others. • Transportation demand management (TDM): includes variables de- scribing the characteristics of the traveling demand in each zone. This 41 2.2. Collision Prediction Models group consists of the total commuters, the commuter density, the core area, the shortcut capacity and the number of drivers commuting. The results from the 35- CPMs developed in this study were: • 15 models were successfully developed based on measured data, al- lowing practitioners without access to Emme 2 models to apply CPMs regardless. • Increased collision occurrence was associated with the majority of explanatory variables. Although decreased level of collisions was related to higher family sizes, core residential area, number of three way intersections and local road lane kilometers (Lovegrove and Sayed [17]). 2.2.7 Collision Prediction Models (CPMs) Applied to Urban Transportation Systems In recent years there has been a significant effort to include safety at the planning stage in North America. Research studies have been focusing mostly on developing techniques and decision tools which will facilitate a proactive safety approach. An important example of those methodologies are the planning-level collision prediction models. However, most collision prediction models have been developed for auto collisions and do not include explanatory variables related to transit characteristics (Cheung et al. [3]). Thus, CPMs accounting for transit elements are scarce in the literature. Two studies were found to develop transit safety models. The first study was performed by Jovanis et. al. in 1992 using data from the metropolitan Chicago area. The models were developed at the route level using a log-linear regression. They related transit collision frequency on a route to annual revenue miles, weekday average ridership, average weekday morning headway, annual revenue hours, speed of the route and bus driver attributes (i.e. age, gender) (Jovanis [12]). Nevertheless, the models failed to include variables related to geometric design or road characteristics (Cheung et al [3]). The second study was developed by Cheung et al. in 2008, based on a data set from Toronto, Canada. The models were developed using the Generalized Linear Regression (GLM) technique and assuming a Negative Binomial error structure. Two types of models were developed: 42 2.2. Collision Prediction Models • Zonal level transit involved collision prediction models: included tran- sit characteristics, socioeconomic and road network variables typically used in transportation planning. The mathematical model form was as follows: Transit acc/year = a0V KT bo BKT b1 eΣbi xi (2.65) Where Transit Acc/year=predicted transit involved collision frequency per year V KT =vehicle kilometers traveled BKT = bus or streetcar kilometers traveled xi =other explanatory variables a0 ,b0 , b1 , b2 and bi = model parameters After the analysis of the zonal models, it was concluded that transit related collisions were positively correlated with V KT , BKT , arterial road kilometers, bus stop density and percentage of near sided stops. Additionally, low collision occurrence was associated to zones with high average posted speed and far sided stops. • Arterial level collision models: have the objective of explaining un- derstanding the relationship between collisions (auto and transit), road geometry and transit route characteristics. The model form is expressed below: Total acc/year = a0 (AADT /1000)b0 f b1 Lb2 eΣbi xi (2.66) Where Total acc/year= predicted all traffic collision frequency per year f = transit service vehicle frequency L=arterial segment length From the arterial level models it was demonstrated that increased AADT , transit frequency, arterial road segment length, percentage of 43 2.2. Collision Prediction Models near sided stops and presence of on street parking are related with increased collision frequency. Plus, the presence of far sided stops seems to be linked with low collision occurrence. Although the studies described have resulted in an important effort in associating some transit attributes to collision occurrence, further research in transit and planning is required to enhance the validity of existent models and add potentially relevant missing variables (Cheung et al [3]). 44 Chapter 3 Methodology 3.1 3.1.1 Data Geographical Scope The procedures described in this section will be applied to the Greater Vancouver Regional District (GVRD) transit network. The GVRD is located in the province of British Colombia, Canada ( See Figure 3.1). It is composed by 21 municipalities and has an approximate land area of 3,000 square kilometers. According to the census of 2006, the GVRD has a population of 2.2 million inhabitants living in 800,000 households and working on 1,200,000 jobs (Census Canada [2]). The majority of the residential and employment areas are located in the western part of the region surrounding the Central Business District (CBD). While low density suburban residential, agricultural and industrial lands are situated in the south and east parts of the region (Lovegrove and Sayed [17]). Figure 3.2 shows the spatial distribution of population and job density; in both cases the tendency is for high densities close to regional activity centers and inside the CBD. The GVRD public transportation network is composed by buses, rail and ferries. The rail system and the express bus routes are shown in Figure 3.3; additionally the complete bus system is presented in Figure 3.4. The focus of this research will be on light rail and bus systems only. The bus network has 3,800 kms route length, composed by around 180 bus routes and 7,000 bus stops. Correspondingly, the light rail network consists of four lines, and a length of 99 kms. Based on the census of 1996 there were 800,000 commuters, from which 70% used private vehicles and 14.3% used transit as their mode of transportation. Figure 3.5 illustrates the spatial distribution of the bus stop density and percentage of transit commuters. Similar to population and employment density, bus density and transit commuter percentage present high values in the CBD and the surrounding areas. While lower values of bus stop density and transit commuter percentage can be found going away from the CBD and reaching more suburban areas. 45 3.1. Data Figure 3.1: Map of the GVRD 3.1.2 Aggregation The aggregation units were based on the 577 traffic analysis zones (TAZ) used by the GVRD in their Emme/2 4-stage transportation planning model. Furthermore, the model TAZs sizes were specifically chosen with the goal of having enough data points on each one, which enables the development of planning and safety models with an adequate goodness of fit. Also, TAZ zone boundaries overlap with census tracks and municipal boundaries, allowing for easy data integration (Lovegrove and Sayed [17]). Aggregation of data using the GVRD transportation planning model TAZ was selected according to the main goal of this research. That is the application of transit indicators in the development of macro-level collision prediction models CPMs, that are traditionally performed at the zonal level. Additional reasons for using a zonal analysis included the “creation” of more size manageable zonal networks, which allowed for the estimation of transit indicators (for example connectivity) more easily and efficiently. Finally, the estimation of zonal transit indicators instead of the analysis of the whole network allows for comparisons of such indicators between the various TAZs. 46 3.1. Data (a) Population density (inhab/Ha) (b) Job density (jobs/Ha) Figure 3.2: Sociodemographic data for the Greater Vancouver Regional District GVRD 47 3.1. Data 48 Figure 3.3: General map of the public transportation system in the GVRD. (TransLink Web Page, 2011) 3.1. Data 49 Figure 3.4: Map of the complete transit system in the GVRD (Translink Web Page, 2011) 3.1. Data 3.1.3 Sources The list of the variables used and its summary statistics are presented in Table 3.1. The data was extracted and compiled from three main sources: 1. TransLink, the Greater Vancouver Regional District (GVRD) trans- portation authority provided geocoded files of land use, road network, and zone – census tract boundaries from year 1998. As well as geocoded files for transit (bus routes and stops) and rail (lines and stations) from the year 2000. Additionally, TransLink provided two spreadsheets: (a) The first with Emme2 transportation planning model outputs, which consists of traveling demands (i.e. Vehicle and transit kilometers traveled) for the AM morning peak scenario for base year 1996. (b) The second spreadsheet is called “Route Design Summary” and is dated as of September of 1998.It includes : frequencies, type of vehicles and service time span information for each bus route. 2. Census Canada (1996): provided socioeconomic (population, em- ployment, etc.) and mode split data for each zone from the census made in 1996. The 1996 census data was selected, because when the 2001 census was performed there was a transit strike happening in Vancouver. Thus, making the data inadequate according to the objectives of this research. 3. The Insurance Corporation of British Columbia (ICBC), a public au- tomobile insurance company, provided geocoded files of collision claims in the GVRD for the years 1996, 1997 and 1998. By using three years of collision the randomness of the data was decreased, plus it was easier to quantify certain type of collisions (i.e. bicycle or pedestrian) for which data is usually sparse ( Lovegrove and Sayed [17]). In British Columbia most collision claims are handled by ICBC, thus data is centralized in one place. This is different from many places in North America, where there are several private insurance companies. The fact of having only one insurance company in British Columbia, is considered an advantage to overcome the problem of unreported, unattended or incomplete municipal collision data sets. This missing data situation can be seen for example if comparing the high number 50 3.1. Data (a) Bus stop density (No. stops/Ha) (b) Percentage of commuters using transit Figure 3.5: Transit data for the Greater Vancouver Regional District GVRD 51 3.1. Data of collision claims in the GVRD ( 85 990 in average per year, considering years 1996 to 1998 and urban area = 797 km2) with the claims in Toronto (53 286 in year 1996 and urban area=634 km2). Thus, the incompleteness of collision data affect the development of collision prediction models , as already discussed by Ladron de Guevara et al. [15] and Hadayeghi et al. [9]. 52 Table 3.1: Variables and summary statistics (577 zones) for the Greater Vancouver Regional District (GVRD) Variable Description Units Year Source GVRD Zonal Total Average St. Dev. 2460.052 Sociodemographic POP Population POPD Population density 1996 Census 1926086 3336.88 1996 Census 17106.9 29.093 W KG Employment 34.063 jobs 1996 Census 319079 542.65 740.058 W KGA Employment density jobs/Ha TCM Total commuters comm. 1996 Census 5385.07 9.16 30.98 1996 Census 823724 1400.891 1103.16 BUSP Percentage of transit commuters % 1996 Census 14.3 12.51 8.41 stops 2000 TransLink 7866 13.38 10.51 1.021 Transit Network Characteristics NST OP Number of bus stops BDEN Bus stop density stops/Ha 2000 TransLink 107.56 0.18 NS Near sided stops stops 2000 TransLink 870.24 1.48 1.86 FS Far sided stops stops 2000 TransLink 566894 9.61 8.66 MB Midblock stops stops 2000 TransLink 44591 0.76 1.32 PL Stops in a parking lane stops 2000 TransLink 2657 4.52 6.31 TL Stops in a through lane stops 2000 TransLink 4129 7.022 6.58 BB Stops in a bus bay stops 2000 TransLink 626 1.065 2.19 PS Stops in a paved shoulder stops 2000 TransLink 354 0.602 1.28 53 GS Stops in a gravel shoulder stops 2000 TransLink 93 0.16 0.54 NR Number of routes routes 2000 TransLink 184 13.88 24.45 L Total route length km 2000 TransLink 3854.64 6.54 8.11 NL Net route length (without overlapping) km 2000 TransLink 1656.69 2.82 2.66 fAM Sum of route frequencies (AM peak) bus/hr 1998 TransLink 68615.66 124.08 114.89 3.1. Data pop pop/Ha Variable Description Units Year Source GVRD Zonal Total Average St. Dev. Roadway Network Characteristics BUSHOV Bus only HOV lanes km 1996 TransLink 16.41 0.028 0.16 TWOHOV 2+ HOV lanes km 1996 TransLink 75.54 0.13 0.73 T HRHOV 3+ HOV lanes km 1996 TransLink 20.22 0.034 0.39 Total collisions over 3 years cols. 1996-98 ICBC 257970 451 490.19 Collisions T3 S3 Severe collisions (fatal and injury) over 3 years cols. 1996-98 ICBC 63681 114 892.15 PDO3 Property damage collisions over 3 years cols. 1996-98 ICBC 194289 337 122.59 Exposure Total transit and vehicle kilometers traveled veh-km 1996 TransLink 1995835 3394.28 2592.86 V HKT Total vehicle kilometers traveled veh-km 1996 TransLink 1981132 3369.27 2577.94 T KT Total transit kilometers traveled veh-km 1996 TransLink 14682 24.97 28.5 T LKM Total lane kilometers lane-km 1996 TransLink 33587.73 57.12 34.49 3.1. Data V KT 54 3.1. Data 3.1.4 Quality Issues Data quality problems were encountered during the development of this study. Two difficulties in particular are listed below: 1. Integration of transit databases coming from different years and sources. The geocoded files with the bus routes and stops information (containing route lengths, number of routes and number of bus stops) is from year 2000. While the “Route Design Summary” spreadsheet (containing route frequencies, service time span, type of vehicles, etc.) is from 1998. Both sources were provided by TransLink. Additionally, the transit commuting share comes from the Census of 1996. Despite the fact that GVRD TAZs and the Census tracks boundaries overlap, the year differences between the three sources generated some inconsistencies or TAZ with missing data points. One example of an inconsistency was a TAZ having no bus stops but a transit mode share above zero. A case of a TAZ missing data points occurred when routes created after 1998 had no frequency, service time span or type of vehicle available in the 1998 “Route Design Summary” spreadsheet. The inconsistencies were removed before using the integrated transit data set to estimate transit indicators, develop ridership and collision prediction models. 2. Missing information while performing the bus stop sequencing: in order to estimate the connectivity transit indicators the bus stop sequence for each route was required. However, the sequencing of stops information was not available and had to be performed manually based on the bus routes maps available at the TransLink web page. Difficulties arose since the bus stops geocoded files provided were from the year 2000, while the maps obtained online were from 2010. Meaning, routes that were changed or discontinued between the years 2000 and 2010 could not be sequenced. The sequencing was not possible for 18 out of 174 bus routes (10.3% of the bus routes), affecting 2,038 bus stops (28.6% of the bus stops). In addition, the bus stops with the missing sequencing are spread over 172 of the 577 TAZs (29.8% of the zones). Figure 3.6 illustrates the spatial distribution of the bus routes sequenced and not sequenced. As shown the routes not sequenced are distributed randomly all over the GVRD and not in a specific part of the region. Thus, it could be reasonable to assume that errors due to the lack of this data are negligible. 55 3.2. Characterization of the Greater Vancouver Regional District (GVRD) Metro Network Figure 3.6: Bus routes sequenced versus not sequenced in the GVRD 3.2 Characterization of the Greater Vancouver Regional District (GVRD) Metro Network According to the literature review, metro networks around the world have been characterized and compared by various authors (Vuchic and Musso [20], Gattuso and Mirello [8], Derrible and Kennedy [5]). However, the GVRD metro network has not been included in those studies. So as a first approach to achieving a better understating of the most relevant transit indicators, the GVRD metro network was characterized and compared with the other metro networks previously analyzed. The steps followed for this task are described below. First, the GVRD metro network basic characteristics were collected, including: • Route length R • Number of lines NL • Number of stations NS Then, the metro network was redrawn as a graph following the methodology used by Derrible and Kennedy[5]. According to the procedure only 56 3.2. Characterization of the Greater Vancouver Regional District (GVRD) Metro Network transfer and end stations are considered as vertices of the graph, while intermediate stations are not included. In addition, the edges (links) are classified depending on the number of rail lines going through them into single or multiple. The specific details of how to transform metro networks into graphs are explained in Section 2.1.6. Once the metro is converted into a graph, the following basic measures of a graph were obtained by visual inspection : • Number of Vertices V • Number of Transfer Vertices V T • Number of End Vertices V E • Number of Edges E • Number of Single Edges E S • Number of Multiple Edges E M • Maximum number of Transfers δ Using the measurements obtained from the previous steps as inputs, the calculation of some of the most important transit indicators was performed. The indicators computed are listed below: • Degree of connectivity γ • Complexity β • Structural connectivity ρ • Directness τ • Average line length A Finally, the indicators estimated for the GVRD metro network were contrasted with the indicators of other metro networks from around the world previously characterized. 57 3.3. From Bus Networks to Graphs 3.3 From Bus Networks to Graphs The approach used by Derrible and Kennedy [5] to transform metros into graphs G(V, E) will be applied for the first time for the case of a bus network. The objective is to collect the basic measurements of the graph representation such as: • Number of Vertices V • Number of Transfer Vertices V T • Number of End Vertices V E • Number of Edges E • Number of Single Edges E S • Number of Multiple Edges E M Later on, and based on the measurements obtained from the graph representation, we will be able to characterize a bus network using different transit indicators. However, regarding the conversion of the bus network into a graph, it is important to mention that the methodology followed by Derrible and Kennedy [5] was designed for the case of metros not bus networks. Therefore, some adaptations according to the specificities of bus networks characteristics must be performed beforehand. In fact this is the first time graph theory transit indicators will be applied to characterize a bus network. Some of the particular features of bus networks which required the original methodology for indicator’s calculation to be changed are explained below: • Increased complexity of bus networks compared to metro networks: the number of vertices V and edges E (links) is higher, by several orders of magnitude, in the case of bus networks. To illustrate, the GVRD metro network consists of 47 stations and 4 lines; while the bus network is composed by approximately 7,000 stops and 180 routes. The procedure followed by Derrible and Kennedy [5] already reduces the complexity by only considering end and transfer stops as vertices. But even when removing intermediate stops, the bus network graph representation still has a very high number of vertices V and edges E. For the GVRD case, the resultant graph representing the bus network has around 5,000 vertices and 4,800 edges. So, in order to have 58 3.3. From Bus Networks to Graphs more manageable size graphs the analysis will be performed using aggregated data based on traffic analysis zones (TAZ). The traffic zones selected for the analysis of the GVRD are described in further detail in Section 3.1.2. A zonal analysis was not only selected with the aim of obtaining size-manageable graphs easier to study afterward. Other important reasons included: – Allowing for comparisons between networks: to the best of the authors knowledge bus networks have never been characterized before using graph theory transit indicators, meaning there will be no possibility of contrasting with bus networks from other cities. Then, having the overall values of different indicators for just the GVRD case will not prove to be very meaningful by itself. Therefore, doing a zonal analysis at least will allow the comparison between networks from different traffic zones, and the identification of zones which might need improvement on a particular indicator, among other applications. – The availability of GVRD data (traffic, demographic, safety, etc) already sorted by TAZ (See section 3.1.3 ) – The intend of including the zonal transit indicators afterward for the development of macro- level collision prediction models CPMs (which are performed at the zonal level as well). This part will be explained in closer detail on Chapter 3. • Bus routes are not always “two-way”: in the case of metro networks lines can usually be assumed to be “two-way”, therefore they could be easily represented as undirected graphs. However, in the case of transit most of the routes do not follow the same path or the same sequence of stops for both route directions. Then, directed graphs were selected as a more appropriate graph representation. The use of directed graphs leads to deciding on a scheme to count the number of directed single E S and multiple edges E M . • Walking transfers: generally the distance between bus stops is smaller than between metro stations. Due to this fact, transit networks rely sometimes on transfers from one route to another by having an intermediate walking transfer in between. When representing bus networks as graphs this additional sets of links (walking links E W ) must be considered as well. 59 3.3. From Bus Networks to Graphs Based on all the issues discussed above, the methodology to redraw metro networks into graphs was adapted for the case of bus networks. The steps followed to transform the whole bus network into a graph representation are shown in Figure 3.7 and are described below: 1. The whole bus network is redrawn as a graph by only considering the transfer and end bus stops as vertices. 2. The graph representation is divided by TAZ, creating various zonal graph representations. In order to split the graph the following rules were applied: (a) Dividing cross-zonal links: links (edges) connecting two vertices located in two different zones, will be broken by only keeping links directed towards the zone being studied as part of the graph . In figures 3.7 a simple example of a network composed by two zones is cropped to illustrate. The link going from 6 to 1 (directed towards zone A) will belong to the graph representation of zone A, while the link going from 1 to 6 (directed towards zone B) will be part of the graph of zone B. (b) Vertices outside the zone boundary: after dividing cross-zonal links by zones, the vertices that end up outside the zone being studied, but are still connected to vertices inside the zone, will not be removed from the graph. In the sample exercise of the two zones being cropped shown in Figure 3.7, vertex 1 will be kept as part of the zonal graphs for both zones. The same for vertices 4 and 6, they will not be removed. 3. Correcting the zonal graphs: splitting the links (edges) by zones might have created situations where vertices are neither transfer or end anymore. Zonal graphs must be corrected by removing those “new intermediate vertices”. Figure 3.7 shows how vertex 1 becomes intermediate after cutting the graph by zones and is removed afterwards from the graph. 4. Walking links (edges) E W are added to the zonal graphs based on the following rules: (a) There will be a walking link connecting any two vertices if there is a distance of 80 m or less between them. The selection of a 60 3.3. From Bus Networks to Graphs distance range of 0 to 80 meters was based on the typical size of an intersection diameter. (b) The two vertices involved must have different bus routes stopping on them (at least one route must be different, so the link is not redundant) (c) Walking links are assumed to be undirected, meaning they connect vertices in both directions. 5. Vertices connected by a walking link are merged into a single vertex. The combining of stations was done assuming that the frequency of a walking link is infinite. Thus, if an infinite frequency link is connecting two vertices, it could be reasonable to assume merging them creates an adequate equivalent network. 6. Once zonal graphs are corrected, with walking links added and ver- tices merged; the following measurements can be collected: (a) NR : number of routes (b) L: total route length, which represents the sum of individual route lengths Li (c) V : number of vertices. After splitting the links by zones, the vertices that end up outside the zone being studied, but are still connected to vertices inside the zone will be counted as: one over the number of zones they are connecting . In the sample exercise of the zones A and B being cropped shown in Figure 3.7, vertex 6 will be kept as part of the zonal graphs for both zones. However, vertex 6 will be counted as 0.5 (1 vertex / 2 zones connected) of a vertex for zone A and 0.5 of a vertex for zone B. The same will apply for vertex 4. While vertex 6 will be counted as just one vertex for zone B. (d) E: number of edges (composed by single E S , multiple E M ).In the analysis made by Derrible and Kennedy [5] of metro networks it was assumed all edges were undirected and, one unidirectional single edge connecting two vertices was counted as one single edge (E S = 1). In this case directed graphs are being used, thus one directed single edge linking two vertices will be counted as half of a single edge (E S = 0.5). The same counting method will be applied for the case of multiple edges E M . 61 3.3. From Bus Networks to Graphs (a) Whole network graph representation (c) Correction of zonal graphs (b) Creation of zonal graphs (d) Creation of walking links (e) Walking links merged Figure 3.7: Example of conversion of a bus network into a graph. 62 3.4. Characterization of the GVRD Bus Network Using Existent Transit Indicators 3.4 Characterization of the GVRD Bus Network Using Existent Transit Indicators Based on the measurements obtained from the graph representation, many graph theory based transit indicators can be estimated. For the GVRD case study, the following existing indicators were calculated for each zonal graph: From the indicators listed, only some alterations and assumptions were performed for the assessment of the coverage σ and the LITA indicators; as explained below: • Coverage σ : in order to adapt the indicator the radius r around the bus station chosen was smaller than the values typically suggested when analyzing metro networks. A radius r = 50m was selected, as it was the minimal distance between bus stops for our case study (the Greater Vancouver Regional District). This ratio, avoided overlapping of the circular areas around the bus stops. Note coverage σ is a function of the number of stops NST OP and, shall not be confused with the number of vertices V . Finally, the AServed for each zone was estimated as the sum of areas labeled with the following land uses: residential, commercial, industrial, institutional and transportation. • Local Index of transit availability LITA: the calculation was based only on the AM bus route frequencies to obtain the number of vehicles vi entering each zone. The number of seats for each bus si was assumed to be equal to 40, as most of the buses in the GVRD are 40 foot vehicles (which typically have a capacity of 30 to 40 seats). The area of land developed ai was assumed to equal the AServed . Finally, the final overall LITA score was calculated as suggested by Wiley [28] but instead of using his discrete scores (from 1 to 5), just the percentile in which the z-score was located was taken as the overall LITA score. The remaining indicators (β ,γ and ρ) were estimated as specified by their authors. However, during their calculation some issues were found in the case of the degree of connectivity γ and complexity β . Two type of inconsistencies appear when comparing various networks against each other based on this two connectivity indicators: 1. Inclusion of multiple edges E M in the calculations of γ and β can lead to misinterpretation of the value of the connectivity in a network. For 63 3.4. Characterization of the GVRD Bus Network Using Existent Transit Indicators Variable Symbol Equation Comments Proposed by Geographical Indicators Average Edge Length η η= Theta θ θ= L E L V Kansky [13] Using length of the Kansky [13] network L, instead of traffic volumeT Inter-station spacing IS IS = LN Vuchic and Musso NST OP 20 Overlapping degree OD OD = 1 − LN L Vuchic and Musso 20 Connectivity Indicators Structural ρ ρ= Vct −E M VT Derrible and Connectivity Kennedy [5] Complexity β= β E V Number of edges E as Garrison and defined in Equation Marble [7] 2.42 Degree of connectivity γ γ= E 3(V −2) Number of edges E as Garrison and defined in Equation Marble [7] 2.42. Other Indicators Coverage σ σ= NST OP ∗π∗r2 AServed With r = 50m and Derrible and Aserved according to Kennedy [5] land use. Local Index of Transit Availability LITA Eqs. 2.49,2.50 and With fi of the AM 2.51 Peak, seats si =40 Rood 25 pax and area developed ai = AServed Table 3.2: Existent transit indicators estimated 64 3.4. Characterization of the GVRD Bus Network Using Existent Transit Indicators (a) Network a Network (b) Network b Edges Vertices (c) Network c Single Multiple Edges Edges Degree of Complexity ES EM E V γ β Connectivity a 3 0 3 3 1 1 b 2 1 3 3 1 1 c 1 2 3 3 1 1 (d) Connectivity calculations Figure 3.8: Example of comparison of connectivity calculations (β and γ) of three networks with different configurations, but the same number of edges E. example, as shown in Figure 3.8 there are three networks with different configurations (configuration being defined as the way the edges E are related to pair of vertices V ), each having three vertices and three routes (R1, R2 and R3). However, when estimating the connectivity values (γ and β as presented in Figure 3.8(d) ) for the three of them the result is the same. This is not exactly correct, in reality the most connected network based on just configuration, would be network (a). It has the maximum number of edges or links possible for a graph of three vertices. In contrast, and based solely on configuration, network (c) would be the one with the least connectivity as there are no links available between vertices V1 and V2 and vertices V1 and V3 . 2. Ignoring other operational factors of transit while estimating and rank- ing networks based on their connectivity (γ and β ) can generate inaccurate results. Operational factors can include frequency of routes, speed of routes, distance between vertices and capacity versus demand of links, among many others. In the case illustrated in Figure 3.9 two networks with the same configuration, but different route fre65 3.5. Proposed Transit Indicators (a) Network a Network (b) Network b Single Multiple Edges Vertices Edges Edges ES a b Degree of Complexity EM E V γ β 3 0 3 3 1 1 3 0 3 3 1 1 Connectivity (c) Summary of connectivity calculations Figure 3.9: Example of comparison of connectivity calculations (β and γ) for two networks with the same configuration but different route frequencies f . quencies f between vertices V2 -V3 are compared. As presented on Figure 3.9(c) the connectivity of both networks is the same, but in fact network (b) will have a higher connectivity as route 4 (R4) is more frequent than route 3 (R3). 3.5 Proposed Transit Indicators As explained with the simple example networks from the previous section, the calculation and ranking of networks based on traditional connectivity indicators (β and γ) can generate misleading outcomes. Thus, a connectivity indicator which evaluates configuration as well as transit operational factors was developed to overcome the limitations of the traditional indicators . Based on the information and time resources available for this research, only the frequency f was considered as a transit operational factor. The proposed way to include the frequency f of bus routes in the connectivity calculations, creates a new variable called “number of edges normalized by frequency” symbolized as E f . The concept of “number of edges 66 3.5. Proposed Transit Indicators normalized by frequency” is similar to a weighted average of the number of edges based on the frequency of the bus routes. Then, using the “number of edges normalized by frequency” as input, the connectivity indicators were re-calculated using two different methods: • Method 1-> The number of edges E was replaced by E f in the formu- las for β (Equation 2.8) and γ (2.6). Obtaining: Ef V (3.1) Ef 2(V − 3) (3.2) β = γ = • Method 2 -> Estimating connectivity as a weighted value of the tra- ditional connectivity indicators . It is the product of the number of edges normalized by frequency E f and the connectivity of the network (based solely on configuration) are counted equally. Mathematically this is expressed as follows: ES ) V (3.3) ES ) 2(V − 3) (3.4) β = E f ∗ β (E s ) = E f ∗ ( γ = E f ∗ γ(E S ) = E f ∗ ( Where β (E s ) and γ(E S ) represent the connectivity indicators for β and γ, but using the number of single edges E S instead of the total number of edges E (which includes the number of multiple edges E M as well). As discussed in the previous section, including E M leads to inadequate values of connectivity . f The mathematical expression for the normalized number of edges Em for a traffic analysis zone m would be as follows: 67 3.5. Proposed Transit Indicators q Emf = Σll=1 (0.5 ∗ (Σk=1 Σ pj=1 Σni=1 fi jk )) fmax (3.5) Where fi jk is the frequency of the k-th route linking vertices from Vi to V j . The sum of frequencies in the edges Σk Σ j Σi fi jk is multiplied by 0.5 because (as it was stated before in Section 3.3 ) one directed edge will be counted as half of an edge. Then each 0.5 will be counting each directed edge, while the “weighted average” by frequency is performed. Finally, fmax is the maximum sum of frequencies of any pair of links Vi and V j in the whole network (among all TAZs). Where fmax is expressed mathematically as: q m ] = max[Σ fmax = max[ fmax k=1 ( f i jk + f jik )] (3.6) m is the maximum sum of frequencies of any pair of links V and Where fmax i V j in zonem. The maximum sum of frequencies fmax of routes between a pair of verticesVi and V j , with the highest number of route vehicles . We obtain fmax comparing pairs of vertices in the whole network (all traffic analysis zones) in order for further indicators (i.e. E f , β , γ , β , γ ) to be comparable between each other. To understand better these concepts of maximum sum of frequencies fmax and normalized number of edges E f an example of the calculation for a network consisting of two zones (see Figure 3.10) will be performed. Note that networks in zone A and B have pretty much the same configuration and bus route frequencies, the only difference is that the network in zone A has an additional route (R4) going from vertex V3 to vertexV1 . Therefore, we will expect the network in zone A to have a higher E f than the network in zone B. First the maximum sum of frequencies fmax is estimated and presented in Table 3.3. From the calculations of fmax it can be seen that the pair of vertices with the highest number of traveling route-vehicles h (i.e. the maximum sum of frequencies) are vertices V1 and V2 located in zone A (or vertices V4 and V5 located in zone B). 68 3.5. Proposed Transit Indicators (a) Zone A (b) Zone B Figure 3.10: Example of a network composed by zones A and B Zone Σk ( fi jk + fi jk ) Pairs of vertices (veh/hr) A V1 −V2 f12(R1) + f21(R1) =5+5=10 A V2 −V3 f23(R3) + f32(R4) =5+2=7 A V1 −V3 f31(R2) =7 A fmax = 10 B V4 −V5 f45(R5) + f54(R5) =5+5=10 B V5 −V6 f56(R6) + f65(R7) =5+2=7 B fmax = 10 fmax = 10 Table 3.3: Calculation of fmax for the example on Figure 3.10 69 3.5. Proposed Transit Indicators Finally, the number of edges normalized by frequency E f are obtained for both zones as follows: f EA = f 0.5 ∗ (5 + 5 + 7 + 5 + 2) EB = 10 0.5 ∗ (5 + 5 + 5 + 2) 10 = 2.4 edges = 1.7 edges From the calculations it can be seen thatE f is higher for the network in zone A than in zone B, which was what we expected from the initial analysis. Figure 3.11 shows another example network composed of 7 traffic analysis zones. We will estimate connectivity indicators: traditional (β and γ), configuration based (β (E S ) and γ(E S )) and proposed (β , γ , β ’ and γ ). Then, we will rank the zonal networks according to the three types of indicators. Note that for the particular case of networks having three vertices (which is our example) β =γ, β =γ and β =γ . As such we will only present the results for γ, γ and γ . The frequencies of the bus routes are as follows: R 1: f1 =5 veh/hr R 2: f2 =7 veh/hr R 3: f3 =3 veh/hr R 4: f4 =6 veh/hr Intuitively or just by visual inspection the following relationships between the zonal networks could be established: • Connectivity of network in zone B > Connectivity of network in zone A • Connectivity of network in zone B > Connectivity of network in zone C • Connectivity of network in zone C > Connectivity of network in zone D • Connectivity of network in zone E > Connectivity of network in zone C • Connectivity of network in zone B > Connectivity of network in zone E 70 3.5. Proposed Transit Indicators • Connectivity of network in zone F > Connectivity of network in zone E • Connectivity of network in zone G > Connectivity of network in zone B m and f First, the calculations for the maximum sum of frequencies fmax max are shown in Table 3.4(a). According to the results.among all zones, zone F has the highest number of routes going through between vertices V16 and V17 (or in zone G between vertices V19 and V20 ) having a fmax = 22veh/hr. With the value of fmax and the number of edges normalized by frequency E f the proposed connectivity indicators (γ andγ ) can be estimated. A summary with the measurements of the networks (V , E S , E M ), the calculations of the traditional indicators (γ) and proposed connectivity indicators (γ andγ ) is presented in Table 3.5. Plus, the connectivity relationships we established before using mathematical operations are revised (See Table 3.4). Finally a ranking of the networks according to the various indicators is shown in table 3.4. From the results, it can be seen that in the case of traditional indicators the first and second type of errors described previously in section 3.4 appear. By including the number of multiple edges E M in the calculation of γ, the configuration of the network is not really evaluated and thus leads to inadequate results. In this example the value of γ indicates that networks B, E and F have the same connectivity, which is incorrect (connectivity of B>F> E). However, looking solely at the configuration using γ(E S ) overlooks networks having multiple edges, which provide a better connectivity than networks having only single edges. In this example, a ranking based on γ(E S ) will consider networks C, E and F to have equal level of connectivity. Despite the fact that networks E and F have one extra route (one edge) more than network C, thus offering higher connectivity. Similar analysis can be applied when comparing networks B and G. On the other hand, the results obtained from proposed indicators (γ and γ ) seem to produce reasonable results. They hold true the connectivity relationships established before starting the calculations. Except for case of γ when comparing connectivity of networks in zones B and E. So initially, γ would seem a better connectivity indicator. However, further calculations of γ and γ in all the GVRD zones must be performed before reaching a final conclusion. The estimation of proposed connectivity indicators for the 577 GVRD 71 3.5. Proposed Transit Indicators TAZs was done in two steps. First, all transit indicators were estimated for the zonal graphs obtained solely from the transit network. Then, walking links were added and vertices connected by walking links merged. Indicators were estimated again, now accounting for the walking connections. Finally the first and second round of indicators were contrasted. 72 3.5. Proposed Transit Indicators (a) Zone A (b) Zone B (c) Zone C (d) Zone D (e) Zone E (f) Zone F (g) Zone G Figure 3.11: Example of a network composed by seven zones. 73 3.5. Proposed Transit Indicators (a) Calculation of maximum sum of frequencies m (veh/hr) fmax Zone Pairs of vertices A V1 −V3 7 B V4 −V6 14 C V7 −V9 14 D V10 −V12 7 E V13 −V14 16 F V16 −V17 22 J V19 −V20 22 fmax = 22 (b) Connectivity γ(E S ) Zone vs. Zone B > A True True True True B > C True True True True C > D True True True True E > C True False True True B > E False True False True γ γ’ γ F > E False False True True G > B True False True True 2 3 1 0 Errors: (c) Zonal ranking based on connectivity indicators Zone Ranking by: γ γ(E S ) γ γ A 2 3 5 6 B 4 1 3 2 C 3 2 4 5 D 1 4 1 7 E 4 2 3 4 F 4 2 2 3 G 5 1 1 1 Table 3.4: Ranking of networks illustrated in figure 3.11 74 Single Edges Multiple Edges Edges Vertices Edges Norm. by Freq. ES EM E V Em γ f Degree of Connectivity γ(E S ) γ γ A 1.5 0 1.5 3 0.341 0.500 0.500 0.114 0.170 B 3 0 3 3 0.682 1.000 1.000 0.227 0.682 0.364 C 2 0 2 3 0.545 0.667 0.667 0.182 D 1 0 1 3 0.273 0.333 0.333 0.091 0.091 E 2 1 3 3 0.682 1.000 0.667 0.227 0.455 F 2 1 3 3 0.818 1.000 0.667 0.273 0.545 g 3 1 4 3 0.955 1.333 1.000 0.318 0.955 Table 3.5: Summary of network measurements for the network illustrated in Figure 3.11 3.5. Proposed Transit Indicators Zone 75 3.6. Application of Transit Indicators to Ridership Models 3.6 Application of Transit Indicators to Ridership Models Usually the goal of a public transportation system is to serve as many people as possible or achieving high ridership. Therefore, understanding how transit indicators relate to ridership and which indicators are the most influential can aide transit designs which maximize ridership levels (Derrible and Kennedy [5]). In order to develop a relationship between ridership and transit indicators, ridership was defined using the data available for our case study (the GVRD) as the annual transit commuting trips per capita and symbolized as TCPC. Ridership was traditionally expressed per capita making it independent of the transit network size; this is important when comparing zonal-based networks of various sizes. TCPC can be written mathematically as: TCPC = (BUSP ∗ TCM ∗ 2trips/day ∗ 260days) POP (3.7) Where BUSP is the percentage of commuters using transit, TCM is the number of total daily commuters and POP is the population. This definition of ridership was developed based on the information available for the GVRD case study (the 1996 Census of Canada, see Section 3.1.3), therefore has some limitations as it only accounts for the weekday home based trips made due to work or educational purposes. But unfortunately neglects other trips such as shopping, recreational or trips happening on weekends. Ridership was estimated at the TAZ level so it will match with the transit indicators (also estimated at the TAZ level). Once all the data sets were combined (ridership and transit indicators) an outlier analysis was performed. The outlier analysis was performed because the ridership data comes from the 1996 Census, while the transit indicators were calculated based on the data provided by TransLink from the years 1998 and 2000. Thus, some inconsistencies resultant from the integration of data sets coming from different years and sources had to be removed (from example zones with zero bus stops, but having a TCPC>0). After the outlier analysis, linear regressions based on single and multiple variables were attempted as follows: 76 3.6. Application of Transit Indicators to Ridership Models • The model form used was: y = a + Σbi xi (3.8) Where y = TCPC, a is the model intercept, the xi ’s represent the various independent variables and the bi ’s are the parameters of the corresponding independent variables. • Independent variables having a strong correlation coefficient (r>0.4) with the dependent variable (TCPC) were selected . The transit indicators analyzed as potential independent variables included: 1) the proposed transit indicators (β , γ , β and γ ) and 2) the existent transit indicators listed on Table 3.2. • A correlation matrix of all independent variables was calculated in order to identify which pairs were correlated (r>0.4). In any single model, the set of variables included should be independent of each other to prevent multicollinearity. • Multiple linear regression was performed using a forward stepwise procedure. First, only the dependent variable with the strongest correlation r was added to the model. Then, the dependent variable with the second highest correlation r was included in the model. The goodness of fit of the two models was compared. If the fit did not improve with the addition of the second variable then the second variable was dropped, otherwise it was kept in the model. More independent variables were added or removed until the model with the best goodness of fit was achieved. • In order to compare the goodness of fit between various models, the following was considered: – Logical sign of model parameters – Significant t-statistics for model parameters: t >1.96 for a sig- nificance level of 95%. The t- statistic is defined as ti = bi σ (3.9) 77 3.7. Application of Transit Indicators to Collision Prediction Models where σ =standard error bi =model parameter – Model intercept a: the lowest the intercept the better the fit. – Sum of squared errors SSE : the lowest the SSE the better the fit of the model. The sum of squared errors is expressed mathematically as SEE = (yi − y)2 (3.10) where yi = the observed value y = the mean. – Coefficient of determinationR2 : defined as the proportion of variance of the dependent variable that can be explained by the independent variables. R2 ranges between 0 and 1,the higher the R2 value, the better the model. Mathematically R2 is defined as: Σ(y˜i − y)2 R = Σ(yi − y)2 2 (3.11) where, yi = the observed value y˜i is the estimated value (using the model) – Akaike’s information criterion AIC: the AIC is defined as SEE AIC = N ∗ ln( ) + 2K (3.12) N Where N is the number of observations, SSE is the sum of squared errors and K is the number of model parameters to be estimated. The lower the AIC value the better the fit of the model. 3.7 Application of Transit Indicators to Collision Prediction Models The collision prediction model form used was selected according to the available data and to a review of models from the literature (See Section 2.2). The model form chosen was as follows: E(Λ) = a0 Z a1 exp(Σbi xi ) (3.13) 78 3.7. Application of Transit Indicators to Collision Prediction Models Where: E(Λ)= Predicted collision frequency (over a period of three years). Z= External exposure variable. xi = Additional transit related variables, which can also explain collision occurrence. a0 , a1 and b j = parameters of the model The collision prediction models (CPMs) developed for the case of the GVRD were based on 479 urban TAZs. The following data was used: • For E(Λ), the number of total collisions (transit and auto) T 3 was used. Auto to auto collisions are also included to capture collisions which may be caused by the presence of transit vehicles. T 3 was provided by the Insurance Corporation of British Colombia ICBC for the years 1996, 1997 and1998 (See Section 3.1.3). Unfortunately, the isolated value of transit collisions was not available. It would have been interesting to develop models relating transit collisions to transit related explanatory variables. • Exposure Z refers to variables required in order for collisions to hap- pen, as described in the literature review at zero exposure levels collisions must be zero as well. In this research we will develop models based on three possible exposure variables: total transit and vehicle kilometers traveled V KT , vehicle kilometers traveled V HKT and total lane kilometers T LKM. The exposure variables were obtained from TransLink Emme2 transportation planning model in the AM Peak scenario for base year 1996 . (see Section 3.1.3). • Finally the explanatory variables xi s consist of various transit related characteristics and indicators which seem likely to influence collision occurrence. The list of potential variables tested in the CPMs were grouped into four categories: 1. Transit Infrastructure Category: refers to bus system elements physically present in the roads. Examples include number of bus stops, location of bus stops and availability of priority bus or HOV lanes, among others. 2. Transit Network Topology Category: includes variables counting vertices, edges or loops (connectivity). 79 3.7. Application of Transit Indicators to Collision Prediction Models 3. Transit Route Design Category: elements of the network design which can influence system performance . Such as the number of routes, frequency, coverage and connectivity, among others. 4. Transit Performance and Operations Category: performance indicators which reflect the way transit service goals and objectives are being achieved. Performance can be measured as the quality of the service provided and the usage of the service, among others. Operations indicators refer to transit facility management elements which can improve the system performance. Examples of operation indicators include bus route frequencies, service time span, and so on. When considering transit network indicators, it is important to recall the two studied cases : 1) considering a network based on solely transit links and 2) considering a network which accounts for the walking transfers between bus stops plus the transit links. The full list of exposure Z and transit related explanatory variables xi is presented on Table 3.6. Additionally, the summary statistics of the transit data and calculated indicators used as explanatory variables are shown respectively in Tables 3.1, 4.4 and 4.5. The CPMs were developed following the GLM approach as suggested by Sawalha and Sayed [26]. The methodology followed considered: • A Negative Binomial error structure was assumed. • The procedure for the selection of model variables was forward step- wise. Meaning the models are developed by adding one variable at a time to the model and testing how the fit increases or decreases due to the added variable. The order in which variables are added is based on their t-stat, from highest to lowest. • Whether to keep or remove a variable in a model is decided based on three conditions. First, the parameter t-statistic is significant (t >1.96 for a confidence level of 95%). Second, the addition of a new variable generated a significant drop in the SD for a 95% level (>3.84). Third, the variable presents a low correlation with other independent variables already in the model. • Once all variables are evaluated, the overall model fit is assessed using two statistical measures: the Pearson χ 2 and the scaled deviance SD. 80 3.7. Application of Transit Indicators to Collision Prediction Models Variable Description Exposure V KT Total transit and vehicle kilometers traveled V HKT Total vehicle kilometers traveled T LKM Total lane kilometers Transit Infrastructure NST OP Number of bus stops BDEN Bus stops density FS Far sided stops MB Midblock stops NS Near sided stops BUSHOV Length of bus only HOV lanes TWOHOV Length of 2+ HOV lanes T HRHOV Length of 3+ HOV lanes IS Interstation Spacing Transit Network Topology β Proposed complexity γ Proposed degree of connectivity ρ Structural Connectivity η Average edge length θ Average length per vertex Transit Route Design NR Number of routes fAM Sum of AM Peak Frequencies OD Overlapping Degree IS Interstation Spacing σ Coverage β Proposed Complexity γ Proposed Structural Connectivity Transit Performance and Operations LITA Local Index of Transit Availability fAM Sum of AM Peak Frequencies TCPC Transit Commuting Trips per Capita T IME Bus route service time span Table 3.6: Possible variables per zone to include in the CPMs 81 3.7. Application of Transit Indicators to Collision Prediction Models These statistical measures were explained in detail in section 2.2 of the literature review. • The fit of the model is improved by performing an outlier analysis based on Cook’s distance method (see Section 2.2). Total collision models were developed for the case of urban areas ( 479 zones, out of a total of 577 zones ) in the GVRD, following a GLM procedure and based on: 1) four groups of explanatory transit related variables (infrastructure, network topology, route design and performance and operations) and 2) two types of networks (networks considering only transit links and networks considering walking transfers in bus stops plus transit links) . 82 Chapter 4 Results and Discussion 4.1 4.1.1 The Greater Vancouver Regional District (GVRD) Case Study The Metro Network The map and the graph representation of the GVRD metro are shown in Figure 4.1. As discussed in the literature review (Section 2.1.6), Derrible and Kennedy [5] developed three figures in order to compare different metro networks based on state, form and structure. The state figure explains in which phase of development (maturity)the metro is at, based in the complexity β and the degree of connectivity γ indicators. The form figure describes the spatial relationship between transit and the built environment, classifying metro networks as focused on providing good access mostly regionally (trips between suburbs and the city core), locally (trips inside the city core) or both. Finally, the structure figure contrasts networks based on the directness τ and the structural connectivityρ indicators, classifying them as oriented by directness, connectivity or both. Using the figures produced by Derrible and Kennedy [5] the GVRD metro data point was included as shown in Figures 4.2, 4.3 and 4.4. From figure 4.2 it can be seen that the GVRD metro is located in the limit between Phase I and II, meaning that the system is somewhat recently created Indicator Symbol Value Equation Ref. Degree of connectivity γ 0.56 2.6 Complexity β 1.25 2.8 Structural connectivity ρ 0.75 2.47 Directness τ 1.33 2.45 Average Line Length (km) A 24.92 Table 4.1: Basic graph theory indicators for the GVRD metro system 83 4.1. The Greater Vancouver Regional District (GVRD) Case Study (a) Map (b) Graph representation Figure 4.1: GVRD metro 84 4.1. The Greater Vancouver Regional District (GVRD) Case Study and still in the process of developing. In contrast the metro networks from other Canadian cities (Montreal and Toronto) are in Phase I, therefore these networks are relatively new and have not started growing yet. While most metro networks in the United States (New York, Chicago and Washington D.C.) are already in Phases II or III, which includes networks that are growing or significantly expanded already. In addition, from figure 4.3, we can see that the GVRD metro has a small number of stations NS but somewhat long average line length A, therefore it is classified as “Regionally Accessible”, meaning it focuses mostly on connecting the trips between the suburbs and the city core. The metro networks from other Canadian cities (Montreal and Toronto) and American cities (Boston and Washington D.C.) are also classified as “Regionally Accessible”. Although the Montreal and Toronto metro networks have a higher number of stations NS and shorter average line length A, if compared with the GVRD metro network. The Montreal and Toronto metro networks could almost be classified in the “Local Coverage” category, which is designated for networks specialized in servicing mostly trips inside the city core. While, the Chicago metro network was the only North American system to be classified in the “Regional Coverage” category, which applies for networks servicing internal trips in the city core as well as trips connecting the city core and the suburbs. Finally, the Vancouver metro has the lowest structural connectivity ρ of all networks studied, and it also has a very low directness τ. According to the figure 4.4 it can be classified as “Connectivity Oriented”.. The Toronto metro network has the second lowest values of structural connectivity ρ and directness τ; while the Montreal and Boston metro networks have an intermediate value of structural connectivity ρ and a low value for directness τ. Equal to the GVRD metro network, the Boston, Montreal and Toronto metro networks are classified as “Connectivity Oriented”. In contrast, the Washington D.C. and the Chicago metro networks present higher directness τ and lower structural connectivity ρ, therefore are classified as “Directness Oriented”. From the North American cities, only the New York metro presented intermediate values of directness τ as well as structural connectivity ρ, so it can be considered as “Integrated Oriented”. Integrated Oriented networks are focused on providing both connectivity and directness. 85 4.1. The Greater Vancouver Regional District (GVRD) Case Study Figure 4.2: Vancouver metro plotted as part of the state graph Figure 4.3: Vancouver metro plotted as part of the form graph 86 4.1. The Greater Vancouver Regional District (GVRD) Case Study Figure 4.4: Vancouver metro plotted as part of the structure graph 4.1.2 The Bus System Analysis of Sample Zones First, some of the simpler zonal graphs from 8 TAZs in the GVRD (see Figure 4.5) were analyzed more closely using the connectivity traditional (β and γ), solely configuration (β (E S ) and γ(E S )) and proposed indicators (β , β , γ and γ ) and, the ranking of the eight TAZs based on the different types of connectivity is presented in Table 4.3 . A summary with the indicators calculations is shown in Table 4.2. As explained before in the methodology (Section 3.4), traditional indicators account for the existence of multiple edges E M . However this inclusion of E M , without evaluating also the frequency of the bus routes represented by links, can generate results which tend to overestimate connectivity. For example in the GVRD, zone 7100 seems to have higher values of connectivity (γ and β ) than zone 2250. Supposedly, this results are due to zone 7100 having more multiple edges than zone 2250. But comparing the overall bus route frequencies, zone 2250 has higher frequencies than zone 7100. Thus, traditional connectivity indicators can give a misleading measurement. On the other hand, considering solely configuration to measure connectivity (γ(E S ) andβ (E S )) ignores the existence of multiple edges E M . The 87 4.1. The Greater Vancouver Regional District (GVRD) Case Study frequencies of the bus routes are not considered either. According to the results based on γ(E S ) and β (E S ) for the GVRD case: zones 2250, 6790 and 7100 have equal values of connectivity. However, in reality connectivity of zone 2250>7100>6790. The same problem can be observed when comparing zones 3120 and 7530, and zones 5400 and 8670. In order to address network configuration, the existence of multiple edges and the effect of bus route frequencies into a connectivity measurement, two methods and four indicators were proposed (γ , β , γ and β ). The two methods and proposed indicators were explained in detail in Section 3.5. The connectivity values and ranking obtained for the eight GVRD TAZs based on method 1 (using β and γ ) are slightly different from those obtained using method 2 (β and γ ). It seems that method 2 delivers connectivity rankings which are more consistent with the expected intuitive results. For example method 2 ranks third the network in zone 8670 and fourth the network in zone 6790; while method 1 ranks them inversely. But by visual inspection is clear network 8670 is more complex and connected than network 6790. Another failure of method 1, is the ranking of network 5640 in the last place (as the least connected). While method 2, ranks network 3120 in the last place. By visual inspection again, network 5640 offers more connections and is more complex than network 3120, thus it should not be placed as the worst. Based in the results ranking connectivity for 8-TAZ GVRD zones, and in the results of the example (see Figure 3.11 ) developed in the methodology (Section 3.5), it seems that method 2 produces more adequate results so far. 88 4.1. The Greater Vancouver Regional District (GVRD) Case Study (a) (g) (b) (c) (d) (e) (f) (h) 89 Figure 4.5: Some zonal graphs obtained from the Greater Vancouver District Region GVRD Total freq Vertices Single Multiple Edges Edges Edges Norm. by V ES EM Ef β β (E S ) β β γ γ(E S ) (bus/hr) Complexity Degree of connectivity Freq. γ γ 6790 8 3 1 0 0.127 0.333 0.333 0.042 0.042 0.333 0.333 0.042 0.042 2250 21 3 1 0 0.333 0.333 0.333 0.111 0.111 0.333 0.333 0.111 0.111 7100 15 3 1 1 0.238 0.667 0.333 0.079 0.079 0.667 0.333 0.079 0.079 3120 8 4 1 0 0.127 0.250 0.250 0.032 0.032 0.167 0.167 0.021 0.021 7530 9 4 1 0.5 0.143 0.375 0.250 0.036 0.036 0.250 0.167 0.024 0.024 8670 6 4 1.5 0.5 0.190 0.500 0.375 0.048 0.071 0.333 0.250 0.032 0.048 5400 4 4 1.5 0.5 0.095 0.500 0.375 0.024 0.036 0.333 0.250 0.016 0.024 5640 8 5 2 0 0.127 0.400 0.400 0.025 0.051 0.222 0.222 0.014 0.028 Table 4.2: Summary of measurements and connectivity indicators for eight TAZs in the GVRD 4.1. The Greater Vancouver Regional District (GVRD) Case Study Zone 90 4.1. The Greater Vancouver Regional District (GVRD) Case Study Zone Complexity Degree of connectivity β β (E S ) β β γ γ(E S ) γ γ 6790 5 3 4 5 2 1 3 4 2250 5 3 1 1 2 1 1 1 7100 1 3 2 2 1 1 2 2 3120 6 4 6 7 5 4 6 7 7530 4 4 5 6 3 4 5 6 8670 2 2 3 3 2 2 4 3 5400 2 2 8 6 2 2 7 6 5640 3 1 7 4 4 3 8 5 Table 4.3: Ranking of the eight TAZs in the GVRD according to various connectivity indicators. Analysis of the Whole Network The assessment of all the transit indicators was performed for all 577 TAZ in the GVRD. As explained in the methodology section, the calculation of indicators was done by two methods and for two cases: • Method 1: proposed connectivity indicators will be calculated by re- placing the number of edges E by the normalized by frequency number of edges E f in the formulas for β (Equation 2.8) and γ (2.6). Equations 3.5 and3.5 explained previously in the methodology section were used. • Method 2: connectivity indicators are estimated as the product of the number of edges normalized by frequency E f and the connectivity of the network (based solely on configuration). Equations 3.5 and 3.5 explained before in the methodology section were applied. • Case 1- Only bus connections: indicators were determined by only accounting for the links of the bus routes, the results are presented in Table 4.4. • Case 2- Bus connections and walking transfers between stops : the calculation of the indicators took into account the links from bus routes, and the links created due to walking transfers between bus stops (see Table 4.5). It is important to clarify that the zonal graphs analyzed in this second part have the walking links added,and the vertices connected by those walking links merged together. 91 4.1. The Greater Vancouver Regional District (GVRD) Case Study Figures 4.6, 4.7, 4.8 and 4.9 illustrate the spatial distribution of the proposed connectivity indicators (method 1->β and γ ; method 2-> β and γ ) along the GVRD for both cases. The difference in the indicators between case 1 and 2 relies on the fact that in the latter case two changes occur: the number of vertices V decreases and the number of edges E increases. This effect leads to higher values of the connectivity indicators and the η index, but lowers the value of the θ index in case 2. While the coverage σ , the overlapping degree OD and the inter-station spacing IS indicators remain constant in both cases. In case 1 (only bus-to-bus connections) medium to high connectivity is found inside the CBD and in the west part of the region (basically inside the Vancouver municipality). Medium to low connectivity values start appearing gradually while approaching the surrounding suburban areas. On the other hand, in case 2 (bus-to-bus and bus-walking-bus connections) mostly high connectivity can be found inside the CBD ,the west and the north -east part of the region (basically the municipalities of Vancouver and Burnaby), while medium to low connectivity shows up in the surrounding suburban areas progressively. If comparing case 1 and 2, and method 1 versus method 2 various observations can be made: • Connectivity values in case 2 are always higher than in case 1 for all proposed indicators. • High values of connectivity spread over more TAZs in case 2 than in case 1. • In general, the connectivity values decrease gradually while moving far away from the CBD and the west part of the region (the municipality of Vancouver), while reaching the suburban areas. • Connectivity values are more consistent with intuitive expectations if applying method 2 (β and γ ) (see section 4.1.2). Thus, method 2 indicators (β andγ ) will be used for further applications during this study. 92 4.1. The Greater Vancouver Regional District (GVRD) Case Study Variable Description Total Zonal GVRD Average St. Dev. Min. Max. Topological Indicators NST OP Number of Stops 7866 13.378 10.513 0 58 NR Number of Routes 8164 13.884 24.453 0 166 E Edges 4813.5 9.784 10.106 0 69.5 ES Single Edges 2951 5.998 4.855 0 32.5 EM Multiple Edges 1862.5 3.786 6.214 0 44.5 V Vertices 7179 14.591 9.050 0 51 VT Transfer 3639 7.396 5.786 0 34 VE End 3540 7.195 4.178 0 26 ρ Struct.Connectivity - 0.837 0.993 0 5.75 β Complexity - 0.574 0.298 0 1.875 β (E S ) Complexity - 0.377 0.099 0 0.893 (only configuration) β Proposed Complexity 1 - 0.027 0.020 0 0.132 β Proposed Complexity 2 - 0.201 0.286 0 2.629 γ Deg. of Connectivity - 0.231 0.119 0 0.833 γ(E S ) Deg. of Connectivity - 0.156 0.056 0 0.347 (only configuration) γ P. D. Connectivity 1 - 0.011 0.008 0 0.065 γ P. D. Connectivity 2 - 0.075 0.101 0 0.919 Geographical Indicators η Average Edge Length - 1.371 2.850 0 40.521 θ Theta - 0.623 0.987 0 12.156 OD Overlapping Degree - 0.375 0.297 0 0.978 IS Inter-station Spacing - 1.410 6.591 50 89.506 Other Indicators σ Coverage - 0.0869 0.0857 0 0.684 LITA L. I. Transit Avail. - 2.989 1.419 0 5 Table 4.4: Summary of transit indicators statistics for the GVRD. Considering only the bus links (without considering walking links) 93 4.1. The Greater Vancouver Regional District (GVRD) Case Study Variable Description Total Zonal GVRD Average St. Dev. Min. Max. Topological Indicators NST OP Number of Stops 7866 13.378 10.513 0 58 NR Number of Routes 8164 13.884 24.453 0 166 E Edges 6949.5 14.125 12.459 0 87.5 ES Single Edges 5102 10.370 7.373 0 41 EM Multiple Edges 1847.5 3.755 6.233 0 51.5 V Vertices 6882 13.988 8.415 0 50 VT Transfer 4596 9.341 6.356 0 37 VE End 2286 4.646 2.739 0 18 ρ Struct Connectivity - 0.723 0.839 0 4.5 β Complexity - 0.907 0.340 0 2.571 β (E S ) Complexity - 0.704 0.154 0 1.727 (only configuration) β Proposed Complexity 1 - 0.028 0.021 0 0.142 β Proposed Complexity 2 - 0.359 0.486 0 4.284 γ Deg. of Connectivity - 0.374 0.157 0 0.958 γ(E S ) Deg. of Connectivity - 0.295 0.111 0 0.704 (only configuration) γ P. D. Connectivity 1 - 0.011 0.009 0 0.065 γ P.D. Connectivity 2 - 0.136 0.173 0 1.507 Geographical Indicators η Average Edge Length - 0.793 1.487 0 20.261 θ Theta - 0.642 1.001 0 12.156 OD Overlapping Degree - 0.375 0.297 0 0.978 IS Inter-station Spacing - 1.410 6.591 50 89.506 Other Indicators σ Coverage - 0.0869 0.0857 0 0.684 LITA L. I. Transit Avail. - 0.256 0.313 0 1 Table 4.5: Summary of transit indicators statistics for the GVRD. Considering bus and walking links. 94 4.1. The Greater Vancouver Regional District (GVRD) Case Study (a) Only transit (b) Transit and walking Figure 4.6: Spatial distribution of theβ -index in the GVRD 95 4.1. The Greater Vancouver Regional District (GVRD) Case Study (a) Only transit (b) Transit and walking Figure 4.7: Spatial distribution of theβ -index in the GVRD 96 4.1. The Greater Vancouver Regional District (GVRD) Case Study (a) Only transit (b) Transit and walking Figure 4.8: Spatial distribution of theγ -index in the GVRD 97 4.1. The Greater Vancouver Regional District (GVRD) Case Study (a) Only transit (b) Transit and walking Figure 4.9: Spatial distribution of theγ - index in the GVRD 98 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System 4.2 Application of Transit Indicators in Ridership Models in the GVRD Bus System In order to study the relationship between ridership and transit indicators, each indicator was first analyzed separately. Then indicators were grouped in a multiple linear regression. 4.2.1 Single Linear Regression Models Although various transit indicators (proposed indicators and existent indicators listed on table 3.2 ) were evaluated , only six indicators presented a fairly strong positive correlation with transit ridership as defined in section 3.6. The indicators are listed below: 1. Connectivity Indicators As explained in the methodology (Section 3.6) the connectivity indicators were analyzed for two types of networks: 1) networks considering only transit links and 2) networks considering walking transfers to other bus stops and transit links. The connectivity indicators which presented a strong correlation with ridership were structural connectivity ρ, proposed complexity β ” and proposed structural connectivity γ”. Each indicator and its relationship with ridership will be explained further. (a) Proposed complexity β ” and proposed structural connectivity γ : these indicators are an adaptation of the traditional graph theory indicators. They include configuration, as well as frequency of bus routes in order to evaluate connectivity. The proposed connectivity indicators depend on the bus route frequency, the total number of edges (single and multiple) and the number of nodes. The detailed definition can be found in the methodology (section 3.5). Figures 4.11 and 4.10 illustrate the positive linear relation between the proposed connectivity indicators and ridership. The positive relationship is logical as either the addition of new links (bus routes) or the increase in bus route frequency will tend to create more ridership. One interesting result can be observed in figures 4.11 and 4.10 :the slope of case 2 (considering walking and transit links) is approximately half the value of the slope for case 1 (only considering transit links). However, the values 99 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System (a) (b) Figure 4.10: Annual commuting transit trips per capita versus proposed complexity in the GVRD 100 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System (a) (b) Figure 4.11: Annual commuting transit trips per capita versus proposed complexity in the GVRD 101 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System of the proposed connectivity indices are always higher in case 2 than in case 1.This result implies, higher values of the connectivity indices due to the acknowledgment of the presence of walking transfers, which turn into higher ridership. Therefore, case 2 gives a more adequate value for connectivity, as the model slope is lower therefore capturing less variability from other variables affecting ridership but not included in the model (i.e socioeconomic variables, transit elements, etc). (b) Structural connectivity ρ : depends on the number of transfer possibilities in a network, the number of multiple edges and the number of transfer vertices (Section 2.1.6). The plot of structural connectivity versus ridership is presented in figure 4.12 for two cases: 1) only transit links and 2) walking and transit links. In the plots, the positive linear nature of the relationships can be observed. The linear relationship means that an increasing connectivity creates similar results for both small and large networks. This positive relationship is logical as having more transfer possibilities generates more ridership. According to the calculated linear regressions, the slope of case 2 is slightly steeper than the slope of case 1. Meaning that by considering walking transfers as part of the network, an increase in connectivity actually generates a slightly higher ridership. 2. Local Index of Transit Availability LITA: is an indicator used to eval- uate transit performance based on the frequency of service, the capacity and the coverage (section 2.1.7). Figure 4.13 shows there is a positive correlation between the ridership and the LITA. The result is logical, as the LITA is related to the bus frequency, the capacity and the coverage. Thus, if either frequency, capacity or coverage increases in the network, more ridership will be generated. 3. Coverage σ : coverage can be defined as the percentage of bus stop area covered compared to the area served, (Section 2.1.6). From figure 4.14 (b) a positive correlation between coverage and ridership can be identified. This result is logical as coverage is often related to the concept of accessibility. Thus, if accessibility increases ridership increases as well (Derrible and Kennedy [5] ). However, figure 4.14 (a) shows that for medium to high values of coverage (σ > 0.3), the correlation is not perfectly linear. After a value of σ > 0.3, an additional 102 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System (a) (b) Figure 4.12: Annual commuting transit trips per capita versus structural connectivity in the GVRD 103 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System Figure 4.13: Annual commuting transit trips per capita versus Local Index of Transit Availability in the GVRD 104 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System bus stop in the network will not cause a linear increase in coverage. Therefore for the case of very large networks, building an extra bus stop will not generate a significant increase in coverage as it would for a small network (Derrible and Kennedy [5]). Moreover, as mentioned in the literature review (section 2.1.6) a study about the relationship between ridership and coverage but applied to the metro networks of the world found a logarithmic relationship. Comparing this study with the study of metros of the world by Derrible and Kennedy, the results obtained for the GVRD case seem to follow a similar pattern for the coverage indicator. Except in the GVRD only 2.5% of the TAZs present a σ > 0.3, therefore the zonal networks are sufficiently small so that the addition of bus stops can still attract a significant number of new transit commuting trips in each TAZ. Then, a linear relationship between ridership and coverage can be a reasonable assumption for the TAZs in the GVRD. 4. Overlapping Degree OD: is defined as the percentage of route length which is covered by more than one route, the mathematical expression was introduced previously on Table 3.2. Figure 4.15 presents a plot of OD versus ridership; although is hard to observe a linear relationship. This may happen because, there are cases where TAZs have no overlapping between their routes, but have a positive value of ridership (OD=0, TCPC>0). Therefore, OD will not be considered as an explanatory variable in the ridership models. To summarize, only four indicators (σ , β , γ and LITA) present a positive linear relationship with ridership. Although based on the estimated values of goodness of fit (R2 and t-stat), the most influential indicators will be γ” and the LITA. Followed by the β ”,ρ and σ . 4.2.2 Multiple Variable Linear Regressions Once single variable linear regressions were attempted, the next step was to group independent indicators and to perform multiple linear regressions. According to the correlation between indicators, two classes of models were developed: 1. Models including coverage σ and a connectivity indicator (either β ”, γ” or ρ). For the three connectivity indicators two cases were considered: 1) networks with only walking links and 2) networks including walking transfers between bus stops and walking links. 105 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System (a) Before outlier analysis (b) After outlier analysis Figure 4.14: Annual commuting transit trips per capita versus Coverage in the GVRD 106 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System Figure 4.15: Annual commuting transit trips per capita versus Local Index of Transit Availability in the GVRD The goodness of fit statistics are presented in Tables 4.6 and 4.7. The regression constants are very small: in all six regressions . In addition to this, all t-test calculations for the three indicators reached the 95% confidence level. From looking at the significance values for the parameters of the three network design indicators, it seems coverage has less weight than the connectivity indicators. Furthermore, is important to mention, the parameter for two of the connectivity indicators (γ and β ) in case 1 are twice the value of the parameter in case 2. However, the values of the connectivity indicators are always higher in case 2 than in case 1. Thus, the overall result is higher values of the connectivity indices due to the acknowledgment of the presence of walking transfers implies more convenience, more options for riders to connect to other bus routes; turning into higher ridership. Therefore, case 2 gives a more accurate value for connectivity, as the model parameter is lower therefore capturing less variability from other variables affecting ridership but not included in the model (i.e socioeconomic variables, transit elements, etc). 107 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System Finally, the goodness-of-fit (adjusted R2 ) values ranged between 0.67 and 0.69 which is acceptable. Case 1: Only considering transit links: TCT P = 5.149 + 103.821σ + 50.04β ” (4.1) TCT P = 4.938 + 95.184σ + 148.645γ” (4.2) TCT P = 3.888 + 99.018σ + 12.216ρ (4.3) Case 2:Considering walking transfers to bus stops and transit links: TCT P = 5.247 + 99.67σ + 28.334β ” (4.4) TCT P = 5.056 + 90.265σ + 83.625γ” (4.5) TCT P = 4.033 + 111.268σ + 12.744ρ (4.6) 108 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System Regression Statistics No. Parameters Model including: σ,β σ,γ σ,ρ 3 3 3 No. Observations 572 572 572 t-stat Intercept=7.4 σ =9.178 Intercept=7.235 σ =8.481, Intercept=5.58 σ =9.141 β ” =11.288 γ” =12.28 ρ=12.554 R2 0.671 0.685 0.689 Adjusted R2 0.669 0.683 0.687 Standard Error 11.868 11.609 11.537 SSE 167538.9 167538.9 167538.9 AIC -2276.063 -2285.656 -2297.878 Table 4.6: Multiple linear regression results (for models including coverage, overlapping degree and one connectivity indicator) for case 1 (only transit links) in the GVRD Regression Statistics Model including: σ,β σ,γ σ,ρ No. Parameters 3 3 3 No. Observations 572 572 572 t-stat Intercept=7.49 Intercept=7.345 Intercept=5.73 σ = 8.414 σ =7.63 σ =10.668 β ”=10.925 γ”=11.829 ρ=12.002 0.666 0.679 0.681 R2 AdjustedR2 0.664 0.677 0.68 Standard Error 11.962 11.728 11.682 SSE 167538.9 167538.9 167538.9 AIC -2273.364 -2306.363 -2290.292 Table 4.7: Multiple linear regression results (for models including coverage, overlapping degree and one connectivity indicator) for case 2 (walking transfers between bus stops and transit links) in the GVRD 109 4.2. Application of Transit Indicators in Ridership Models in the GVRD Bus System Regression Statistics Model including: LITA No. Parameters 2 No. Observations 572 t-stat Intercept= 3.145 LITA=24.815 R2 0.611 AdjustedR2 0.61 Standard Error 12.668 SSE 161723.5 AIC -2222.179 Table 4.8: Multiple linear regression results (for models including the Local Index of Transit Availability) in the GVRD 2. Models including only the local index of transit availability LITA: the LITA indicator was isolated in the models as this indicator was strongly correlated with connectivity indicators and coverage σ (correlation coefficients of r > 0.4). These high correlations are logical as the LITA indicator is estimated based on the service frequency, which is also the main component for calculating connectivity indicators. Additionally, LITA also considers coverage and capacity scores. This model was already explained in the single linear regression models section (Section 4.2.1) and the regression statistics are shown in Table 4.8. TCT P = 2.548 + 49.89LITA (4.7) The statistical results of the seven linear regressions (equations 4.2.2 to 4.2.2) developed suggest that the five indicators ( σ , γ , β , ρ and LITA) have an impact over ridership and they should be maximized while performing transit network design . The R2 values obtained from the models (between 0.61 to 0.69) are acceptable and suggest a zonal level of aggregation perhaps is not the most appropriate way to capture the impact of the transit indicators over transit ridership. As the values of ridership and transit indicators present a very high variability among all the different TAZs; it is very complicated to obtain linear regressions with more satisfactory values of goodness of fit (R2 > 0.7) to explain ridership. Additionally, the value 110 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD of the model intercept in the seven regressions is somewhat high, revealing that they variability due to factors which affect ridership (i.e. socioeconomic variables such as income, auto ownership or even transit elements) but that were not included in the models are being captured in the model intercept. Due to the reasons explained in the previous paragraph, the ridership predictions obtained from the ridership models developed in this study should be interpreted carefully. They might not generate accurate results, but they still can be applied by practitioners as a guide when comparing various transit network design options. 4.3 Application of Transit Indicators in Collision Prediction Models in the GVRD Collision Prediction Models (CPMs) using various exposure variables and transit indicators as explanatory variables were developed. The developed CPMs were categorized into: Transit Infrastructure Models, Transit Network Topology Models, Route Design Models, and Transit Performance and Operations Models. Tables 4.9, 4.10 and 4.11 present models that predict total collisions (auto and transit) and their goodness of fit summary statistics. From the t-statistics it can be seen that all explanatory variables are significant across all models presented at a 95% significance level. Although, in a small number of models the intercept from the logarithmic equation a0 presented a low t-statistic (slightly below 1.96), meaning, the constant term in the CPMs probably does not influence the model or equals 1.0. Moreover, the two statistical criteria used to measure goodness of fit (scaled deviance SD and Pearson χ 2 ) presented values which are under the target χ 2 at the 95% significance level across all models. The models revealed that increased collisions are positively associated with the main exposure variables (V KT , V HKT or T LKM). This is consistent with the intuitive expectations and the results obtained in previous CPMs developed by Hadayeghi et al [9], Ladron de Guevara et al. [15], Lovegrove and Sayed [17] and Cheung et al [3]. Moreover, the main exposure variables V KT and V HKT were the most influential attributes in the models. While, total transit kilometers T LKM, was not the most influential variable. In some models having T LKM as the main exposure variable, the other explanatory variables were the most significant. But in most of the models, most of the variation was attributed 111 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD to the model constant term a0 . This indicates that T LKM may not be the best main exposure variable in this case. The relationship between explanatory variables and total collisions will be explained below based on the four themes of models: Transit Infrastructure Models Transit infrastructure variables refer to bus system elements physically present in the road network. Including number of bus stops NST OP , bus stop density BDEN, interstation spacing IS and length of HOV lanes (BUSHOV , TWOHOV , T HRHOV ). The models obtained in this group revealed that increased collisions are related to an increase in both the number of stops and in bus stop density. The positive correlation of collisions with bus stop density is consistent with the results from the transit safety studies developed in the Chicago and Toronto areas by Jovanis [12] and Cheung et al. [3] respectively. The positive parameters of bus stop density suggest that a higher number of collisions can be expected in zones with high bus stop density. The presence of more bus stops in a zone will tend to increase the amount of times transit vehicles pull over to drop off or pick up passengers.The fact that buses stop more frequently can cause conflicts with other vehicles. While developing the models, the length of bus only, 2+ and 3+ HOV lanes were tested as potential explanatory variables. However, only the length of bus only and 3+ HOV lanes resulted to be significant. Thus, the length of bus only and 3+HOV lanes were summed and combined in one variable named BT HOV . In the models BT HOV is positively correlated with collisions and is significant. However, BT HOV is the least influential variable if compared with other explanatory variables in the group. This outcome diverges with the results obtained by Cheung et al [3], where priority bus-HOV lanes were not significant at the 5% confidence level. Additionally, it is interesting to point out that HOV lanes are claimed to enhance safety by separating transit vehicles from other vehicles. However, the results of this study imply that perhaps the extra maneuvers that transit vehicles have to undergo to get in and out of such priority lanes can actually lead to an increased risk of collision occurrence. Nevertheless, more empirical evidence is required to reach a final conclusion on the actual impact of HOV lanes over safety. On the other hand, the models showed that decreased collision occurrence is related to increases in the following explanatory variables: intersta112 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD tion spacing and number of far sided and mid-block bus stops. The negative parameter for interstation spacing and far-sided stops matches the results obtained in the transit collision models developed by Cheung et al [3] in the Toronto area. In the case of interstation spacing the inverse relationship is logical, as a small spacing between bus stops ( corresponding to higher bus stop density) raises the frequency of transit vehicles stop- and- go events (Cheung et al [3]). Far-sided and mid block stops are usually considered safer as they avoid the conflicts generated by near-sided stops. The principal safety concern in near sided stops is that during a green light, transit vehicles stop while other vehicles are trying to turn right- this is a common cause for rear end collisions. Transit Network Topology Models Transit network topology refers to variables counting vertices, edges or loops. It consists of connectivity indicators (structural connectivity ρ, proposed degree of connectivity γ , complexity β ), average edge length η and average length per vertex θ . Two cases were considered for the indicators: 1) networks including only walking links and 2) networks including walking transfers between bus stops and transit links. As such, models considering the two cases or types of networks were developed. The results from the developed CPMs showed that there is a positive correlation between collision frequency and all connectivity indicators (i.e. degree of connectivity, complexity and structural connectivity). Furthermore, connectivity indicators seem almost as influential as the main exposure variables (V KT , V HKT or T LKM). Connectivity indicators have never been included as potential explanatory variables in CPMs, therefore the results obtained can only be compared with logical expectations . Although three different connectivity indicators were applied in the models, in general connectivity can be defined as the availability of links between vertices in the transit network. The structural connectivity ρ is associated with the number of transfer possibilities and the number of transfer vertices. On the other hand, the degree of connectivity γ and complexity β , is related to the number of edges, route frequency and network configuration. Based on the definitions of connectivity, positive parameters for the connectivity variables in the CPMs are found because a more connected network will have more frequent routes, more availability of route transfers or links between nodes. This will give more opportunities for vehicle conflicts and collisions to happen. 113 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD Another important observation is that the model parameters for all indicators are higher in case 2 than in case 1. Conversely, the leading constant term is slightly higher in case 1 than in case 2. The overall effect implies that models will predict a higher collision occurrence in case 2 than in case 1. Thus, ignoring walking transfers between bus stops as part of the transit network can underestimate the actual collision occurrence. Additionally, this outcome reflects how walking transfers affect safety as well. A high number of walking transfers imply that bus stops are close to each other (stops must be within walking distance, less than 80m), thus generating more frequent transit stop-and- go events that might enter in conflict with other vehicles, transit users and even pedestrians. Finally, the average length per vertex θ resulted not to be significant and an inverse relationship between collisions and average edge length η was obtained. The concept of average edge length η is similar to interstation spacing, thus the effects of this variable have the same explanation as interstation spacing. Route Design Models Transit route design variables consist of elements of the network design which can influence system performance. Variables in this group include number of routes NR , interstation spacing IS, route frequency in the AM peak fAM , coverage σ , overlapping degree OD and connectivity indicators ( structural connectivity ρ, degree of connectivity γ ’ and complexity β ’). The models revealed that there is a positive correlation between collisions and the number of routes, the route frequency in the AM peak, the coverage, the overlapping degree and the connectivity indicators. The effect of interstation spacing and connectivity indicators over collisions was explained previously in the transit infrastructure and network topology groups. On the other hand, the positive parameter for bus route frequency suggests that an increased flow of transit vehicles can increase the risk of conflicts between transit and other vehicles, thus generating increased collision occurrence. Similarly, the positive parameter for coverage suggests an increased spatial presence of transit, which implies increased opportunities for collisions to occur. The last association to mention is the one existing between either the number of routes NR and the overlapping degreeOD with the number of collisions. Both variables are positively correlated with the number of collisions, this reflects how the interaction of more than one transit vehicle with the rest of the traffic flow also increases the risk of collisions. 114 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD Finally, it is important to mention that based on the t-statistics, the interstation spacing IS and the proposed connectivity indicators (γ and β ) are the most influential explanatory variables in this group Nevertheless they are not as significant as the main exposure variable (either V KT , V HKT or T LKM). Transit Performance and Operations Models Transit performance variables reflect the way transit service goals and objectives are being achieved, in terms of quality of the service and usage of the service among others. On the other hand, operations variables refer to transit facility management elements which can improve the system performance. The variables in this group are the local index of transit availability LITA,the transit communing trips per capita TCPC to represent performance,the frequency of routes in the AM peak fAM and the service time span T to represent operations. The models developed revealed that collisions are positively correlated to all variables in the group. All Models The CPMs described in the previous sections revealed that collisions are positively correlated to the number of stops NST OP , number of routes NR , bus stop density BDEN, coverage σ , overlapping degree OD, connectivity indicators (structural connectivity ρ, proposed degree of connectivity γ and proposed complexity β ), transit commuting trips per capita TCPC, AM peak route frequency fAM , service time span T and the local index of transit availability LITA. Despite this result, it cannot be concluded that improving the transit network (by adding bus stops, coverage, connectivity or to any of the other explanatory variables mentioned) will lead to an increase in collision frequency. As improving transit will likely generate a reduction in auto usage and as a result a reduction in the main exposure variable (V KT or V HKT ). Although the percentage in which the transit variables listed change as the main exposure variables change (i.e elasticity) was not studied in this research; the expected outcome is that improving transit (by increasing the listed transit variables) will generate an overall positive effect on safety. 115 κ df SD χ2 2 χ0.05,d f 2.273V KT 0.631 exp(0.011NST OP + 0.706BT HOV ) 1.67 453 501 467 504 a0 =2.34 V KT =13 NST OP =2.48 BT HOV =2.17 2.324V KT 0.671 exp(−0.174MB + 0.652BT HOV ) 1.69 454 502 460 505 a0 =2.58, V KT = 16.13 MB= 6.15 BT HOV = 2.29 1.898V KT 0.643 exp(2.291BDEN) 1.70 479 530 523 531 a0 =2.16 V KT =17.38 BDEN=8.52 29.591V KT 0.760 exp(−1.139FS + 0.381BT HOV ) 1.76 474 524 493 526 a0 =8.64 V KT =18.37 FS=9.96 BT HOV =2.13 3.673V KT 0.671 exp(−3.259IS + 0.717BT HOV ) 1.83 462 510 506 513 a0 =3.98 V KT =16.28 IS=10.3 BT HOV =3.34 1.69 477 528 484 529 a0 =4.51 V KT =13.82 γ =8.56 Model form T 3 = t-statistics Transit Infrastructure: Transit Network Topology (only considering transit links): 4.1V KT 0.552 exp(3.856γ ) 4.305V KT 0.5426 exp(1.456β ) 1.68 575 526 484 527 a0 = 4.64 V KT =13.62 β =8.57 2.078V KT 0.644 exp(0.270ρ) 1.66 469 520 494 520 a0 =2.32 V KT =16.38ρ=7.17 2.221V KT 0.679 exp(−0.128η) 1.64 462 511 490 513 a0 =2.39 V KT =16.08η=4.38 Transit Network Topology (considering walking transfers between bus stops and transit links): 4.50V KT 0.540 exp(2.153γ ) 1.68 480 532 515 532 a0 =4.79 V KT =13.51γ =8.73 1.46V KT 0.543 exp(0.841β ) 1.70 477 528 510 529 a0 =4.69 V KT =13.50 β =8.91 2.10V KT 0.642 exp(0.322ρ) 1.65 472 524 491 524 a0 =2.34 V KT =16.26 ρ=7.48 2.157V KT 0.685 exp(−0.241η) 1.65 463 511 509 514 a0 =2.31 V KT =16.23 η=4.22 7.829V KT 0.519 exp(3.045γ − 1.844IS) 1.79 476 526 528 528 a0 =6.6 V KT =13.54 γ = 7.37, IS=7.35 7.998V KT 0.502 exp(1.049β 1.78 476 526 527 527 a0 =6.64 V KT =13.51 β ’=7.14IS=7.41 2.13 454 497 497 505 a0 =1.63 V KT =17.58ρ=2.59 IS=9.45, fAM =5.8 a0 = 3.15 V KT =15.19 γ =4.68 σ =5.65 Route Design (considering only transit links): − 1.862IS) 1.716V KT 0.723 exp(0.104ρ − 2.833IS + 0.003 fAM ) 2.61V KT 0.582 exp(2.315γ + 3.249σ ) 116 1.81 475 525 517 527 2.504V KT 0.589 exp(0.803β + 3.171σ ) 1.81 474 524 519 526 a0 =3.01 V KT =15.4 β =4.53 σ =5.46 1.592V KT 0.649 exp(0.112ρ + 3.716σ ) 1.77 475 526 526 527 a0 = 1.61 V KT =18.13 ρ=2.87σ =6.85 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD Table 4.9: CPMs using VKT as exposure for the GVRD. df SD χ2 2 χ0.05,d f 2.05 455 495 495 506 a0 =3.31 V KT =17.14 IS=10.16 NR =6.95 1.502V KT 0.647 exp(3.674σ + 0.419OD) 1.78 476 527 522 528 a0 =1.42 V KT =18.16 σ =7.3 OD=3.49 t-statistics Route Design (considering walking transfers between bus stops and transit links): 5.299V KT 0.561 exp(2.09γ − 1.955IS) 1.95 469 515 508 521 a0 =5.25 V KT =13.93γ =8.56 IS=7.77 4.814V KT 0.576 exp(0.736β − 1.965IS) 1.96 468 514 498 519 a0 =4.95 V KT =14.28 β ’=8.41 IS=7.86 1.61V KT 0.726 exp(0.162ρ − 2.718IS + 0.0025 fAM ) 2.28 448 485 482 498 a0 =1.45 V KT =17.92 ρ=3.54 IS=9.24 fAM =5 2.3V KT 0.594 exp(1.684γ + 2.708σ ) 1.92 471 519 488 523 a0 =2.8 V KT =15.93γ”=5.87 σ =2.708 2.722V KT 0.579 exp(0.552β 1.83 474 523 519 526 a0 =3.28 V KT =15.11 β =5.28 σ =4.76 1.64 476 527 526 528 a0 =1.7 V KT =17.84 ρ= 3.72 σ =6.74 1.88 473 522 477 525 a0 =1.72 V KT =15.2 TCPC= LITA=4.01 fAM =4.25 + 2.805σ ) 1.592V KT 0.649 exp(0.112ρ + 3.716σ ) Performance and Operations: 1.707V KT 0.59 exp(0.009TCPC + 0.116LITA + 0.002 fAM ) 117 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD κ 2.822V KT 0.672 exp(−3.02IS + 0.0139NR ) Model form T 3 = κ df SD χ2 2 χ0.05,d f t-statistics 2.273V HKT 0.633 exp(0.011NST OP + 0.56BT HOV ) 1.68 451 499 464 502 a0 =2.3 V HKT =13.11 NST OP =2.39 BT HOV =1.74 2.249V HKT 0.674 exp(−0.165MB + 0.745BT HOV ) 1.70 451 499 455 502 a0 =2.49 V HKT =16.29 MB=5.77 BT HOV =2.34 2.014V HKT 0.632 exp(2.482BDEN) 1.72 476 528 525 528 a0 =2.35 V HKT =17 BDEN=8.58 27.298V HKT 0.759 exp(−1.101FS + 0.381BT HOV ) 1.71 476 528 514 528 a0 =8.42 V HKT =18.08 FS=9.53 BT HOV =2.10 3.95V HKT 0.663 exp(−3.271IS + 0.729BT HOV ) 1.82 462 510 508 513 a0 =4.21 V HKT =16.13 IS=10.3 BT HOV =3.38 1.69 477 528 484 529 a0 =4.64 V HKT =13.75 γ =8.65 Model form T 3 = Transit Infrastructure: Transit Network Topology (only considering transit links): 4.247V HKT 0.547 exp(3.90γ ) 4.461V HKT 0.542 exp(1.471β ) 1.68 526 584 422 527 a0 = 4.77 V HKT =13.55β =8.65 2.071V HKT 0.645 exp(0.271ρ) 1.65 470 521 494 522 a0 =2.32 V HKT =16.55ρ=7.20 2.531V HKT 0.664 exp(−0.13η) 1.62 463 512 495 514 a0 =2.75 V HKT =15.55 η=4.41 Transit Network Topology (considering walking transfers between bus stops and transit links): 4.658V HKT 0.536 exp(2.176γ ) 1.68 480 532 515 532 a0 =4.92 V HKT =13.44 γ =8.82 4.531V HKT 0.539 exp(0.85β ) 1.70 477 528 510 529 a0 =4.81 V HKT =13.43 β =8.99 2.17V HKT 0.471 exp(0.427ρ) 1.65 472 524 491 524 a0 =2.45 V HKT =16.16 ρ=7.57 2.298V HKT 0.678 exp(−0.239η) 1.64 463 512 510 514 a0 =2.49 V HKT =16.03 η=4.17 1.79 476 526 527 527 a0 =6.73 V HKT =13.51 γ =7.46 IS=7.38 Route Design (considering only transit links): 8.09V HKT 0.516 exp(3.081γ − 1.851IS) 11.472V KT 0.474 exp(1.069β − 1.069IS) 1.74 479 531 528 531 a0 =9.43 V HKT =14.58 β =7.16 IS=7.8 1.716V KT 0.723 exp(0.104ρ − 2.833IS + 0.003 fAM ) 2.19 452 491 500 503 a0 =1.61 V HKT =17.65 ρ=2.72 IS=9.54 fAM =5.8 2.59V HKT 0.582 exp(2.425γ + 3.096σ ) 118 1.82 474 524 516 526 a0 =3.14V HKT =15.34 γ =4.84 σ =5.32 2.775V HKT 0.582 exp(0.764β + 2.996σ ) 1.72 478 530 518 530 a0 =3.24 V HKT =14.77 β =4.3 σ =5.23 1.665V HKT 0.643 exp(0.119ρ + 3.717σ ) 1.78 474 524 523 526 a0 =1.77 V HKT =17.96 ρ=3.06 σ =6.88 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD Table 4.10: CPMs using VHKT as exposure in the GVRD κ df SD χ2 2 χ0.05,d f t-statistics 2.934V HKT 0.668 exp(−3.031IS + 0.014NR ) 2.04 455 496 496 506 a0 =3.44 V HKT =17.03 IS=10.19 NR =7.05 1.526V HKT 0.645 exp(3.714σ + 0.427OD) 1.77 476 527 522 528 a0 =1.47 V HKT =18.10σ =7.38 OD=3.56 Route Design (considering walking transfers between bus stops and transit links): 5.515V HKT 0.557 exp(2.114γ − 1.957IS) 1.95 469 515 507 520 a0 =5.41 V HKT =13.88 γ =8.66 IS=7.77 5.015V HKT 0.571 exp(0.745β − 1.967IS) 1.95 468 514 498 519 a0 =5.10 V HKT =14.23 β =8.50 IS=7.87 2.19 452 491 500 503 a0 =1.61 V HKT =17.65 ρ=2.72 IS= 9.54 fAM =5.80 1.74 479 531 516 531 a0 = 3.46 V HKT =14.6 γ =5.04 σ =4.19 1.705V HKT 0.724 exp(0.108ρ − 2.83IS + 0.003 f AM ) 2.958V HKT 0.574 exp(1.444γ + 2.414σ ) 2.958V HKT 0.576 exp(0.483β + 2.547σ ) 1.72 479 531 518 530 a0 =3.42 V HKT = 14.58 β =4.72 σ =4.43 1.78 474 524 523 526 a0 =1.77 V HKT = 17.96 ρ=3.06 σ =6.88 1.75V HKT 0.587 exp(0.009TCPC +0.117LITA +0.002 fAM ) 1.88 473 522 477 525 a0 =1.8 V HKT =15.16 TCPC=6.69 LITA=4.05 fAM =4.25 1.53V HKT 0.581 exp(0.199LITA + 0.002 fAM + 0.02T ) 1.77 472 523 513 524 a0 =1.29 V HKT =14.23 LITA=7.29 fAM =4.5 T =2.28 1.665V HKT 0.643 exp(0.119ρ + 3.717σ ) Performance and Operations: 119 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD Model form T 3 = κ df SD χ2 2 χ0.05,d f t-statistics 68.717T LKM 0.476 exp(−0.137MB + 1.148BT HOV ) 1.54 416 464 393 465 a0 =17.22 T LKM=7.38 MB=4.25 BT HOV =3.31 15.145T LKM 0.702 exp(4.536BDEN + 0.388BT HOV ) 1.65 461 510 446 512 a0 =10.78 T LKM=11.45 BDEN=12.38 BT HOV =1.75 1682.103T LKM 0.796 exp(−1.504FS + 0.762BT HOV ) 1.58 455 506 430 506 a0 =20.12 T LKM=11.30 FS=10.40 BT HOV =3.86 162.894T LKM 0.372 exp(−2.719FS + 1.087BT HOV ) 1.56 443 493 425 493 a0 =20.98 T LKM=6.17 IS=8.30 BT HOV =3.51 Model form T 3 = Transit Infrastructure: Transit Network Topology (only considering transit links): 58.149T LKM 0.41 exp(5.787γ ) 1.58 464 515 439 515 a0 =17.84 T LKM=6.33 γ =11.09 61.67T LKM 0.403 exp(2.031β ) 1.56 457 507 422 508 a0 = 17.52 T LKM=6.65 β =10.16 47.684T LKM 0.477 exp(0.367ρ) 1.54 442 492 425 492 a0 =15.12 T LKM=7.48 ρ=8.58 Transit Network Topology (considering walking transfers between bus stops and transit links): 65.072T LKM 0.379 exp(3.295γ ) 1.57 457 507 421 508 a0 =17.28 T LKM=6.79 γ =10.41 69.63T LKM 0.365 exp(1.201β ) 1.58 462 512 433 513 a0 =18.25 T LKM=6.11 β =11.04 49.615T LKM 0.471 exp(0.427ρ) 1.54 447 497 439 497 a0 =15.53 T LKM=7.50 ρ=8.86 1.69 452 502 459 503 a0 =16.88 T LKM=6.74 γ =6.96IS= 5.08 NR =3 Route Design (only considering transit links): 75.437T LKM 0.424 exp(4.418γ − 1.679IS + 0.008NR ) 73.707T LKM 0.438 exp(1.459β − 1.716IS + 0.008NR ) 1.69 450 500 457 500 a0 =16.62 T LKM=6.85 β =6.3 IS=5.21 NR =3.24 10.575T LKM 0.543 exp(0.204ρ − 2.395IS + 0.012NR) 1.68 450 498 474 500 a0 =15.86, T LKM=8.88 ρ= 4.28 IS=7.80 NR =4.62 15.892T LKM 0.622 exp(2.697γ + 5.319σ + 0.444OD) 1.74 455 503 453 506 a0 =9.10 T LKM=9.03 γ =4.39 σ =7.34 OD=3.10 14.3T LKM 0.657 exp(0.783β + 5.259σ + 0.514OD) 14.817T LKM 0.683 exp(0.168ρ + 6.06σ ) 1.94 460 512 457 511 a0 =8.59 T LKM=9.38 β =3.63 σ =7.34 OD=3.53 1.66 456 505 458 507 a0 =10.27 T LKM=11.07 ρ=4.06 σ =9.35 58.056T LKM 0.568 exp(−2.480IS + 0.006 fAM ) 1.68 455 503 470 506 a0 =15.5 T LKM=9.11 IS=8.15 fAM =9.50 10.552T LKM 0.731 exp(5.752σ + 0.55OD + 0.002 fAM ) 1.72 450 498 452 500 a0 =8.17 T LKM=11.4 σ =7.70 OD=3.70, fAM =2.57 120 Route Design (considering walking transfers between bus stops and transit links): 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD Table 4.11: CPMs using TLKM as exposure in the GVRD 108.875T LKM 0.549 exp(2.103γ − 1.710IS) κ df SD χ2 2 χ0.05,d f t-statistics 1.66 464 515 452 515 a0 0=19.60 T LKM=5.9 γ =10.27 IS=5.26 78.618T LKM 0.409 exp(0.945β − 1.647IS + 0.006NR ) 1.75 455 505 442 506 a0 =17.53 T LKM=6.70 β =8.12 IS=5.07 NR =2.65 55.307T LKM 0.558 exp(0.174ρ − 2.394IS + 0.004 f AM ) 1.69 452 502 464 503 a0 =15.31 T LKM=8.96ρ=3.32 IS=7.70 fAM =6.14 25.444T LKM 0.549 exp(2.103γ + 4.615σ ) 1.67 464 513 466 515 a0 =11.90 T LKM=8.48 γ =6.52 σ =6.25 15.83T LKM 0.632 exp(0.595β + 5.242σ + 0.395OD) 1.68 465 518 459 516 a0 =8.88 T LKM=9 β =4.88 σ =7.14 OD=2.77 10.575T LKM 0.727 exp(0.103ρ + 6.053σ 1.72 453 501 451 504 a0 =8.51 T LKM=11.59 ρ=1.88σ =9.52 OD=3.36 1.80 467 518 475 518 a0 =5.57 T LKM=12.15TCPC=7.0 LITA=5.34 fAM =6.0 + 0.527OD) Performance and Operations: 4.96T LKM 0.681 exp(0.01TCPC + 0.17LITA + 0.003 fAM ) T=4.45 121 4.3. Application of Transit Indicators in Collision Prediction Models in the GVRD Model form T 3 = Chapter 5 Conclusions This conclusions chapter is divided in three sections . Firstly, section 5.1 is a summary of all chapters including the main research conclusions. Secondly, in section 5.2 the main contributions offered in this research are listed and a justification of their role in adding to the existing body of knowledge in the the field of transit network design and safety planning is provided. Finally, in section 5.3 future research topics are suggested. 5.1 Summary The main goal of this research was to propose improved transit network indicators with the use of graph theory. This study aims to apply the proposed and existent transit network indicators in the development of macro-level ridership and collision prediction models. Both, the transit ridership and safety models, could be a useful tool which can help practitioners in the decision making process at the planning stage. This study was applied to the GVRD public transportation system, at the zonal level (577 traffic analysis zones). The main motivation for this study arose from : 1) the current growing demand for a high quality and safe transit service 2) the lack of or little consideration of the network properties and safety issues while designing and planning bus systems and 3) the need for further development of empirical tools useful in transit network design and transit safety planning. Two objectives were stated at the beginning of this study and the corresponding conclusions reached are stated below. The first research objective was to adapt and improve network design tools for their application in bus systems. The conclusions regarding objective 1 are as follows: First, the methodology to redraw metro networks into graphs proposed by Derrible and Kennedy [5] was adapted for the case of bus systems. In order to account for bus system specificities, directed graphs were used to represent the predominant one-way bus routes. Additionally in order to rep122 5.1. Summary resent the existence of route transfers by having riders walk from one bus stop to another, stops connected by such “walking links” were merged into a single stop. The merging of stops was based on the assumption that the frequency on those links equals infinity. Finally, since a zonal level of aggregation was selected among all parts of this study (i.e. transit indicators, ridership and safety models), a technique for splitting a whole network and transform it into various smaller zonal networks was developed as well. The technique divided networks by keeping all links directed towards (i.e. inside) the zone as part of the graph and by discarding the rest of the links. It is important to mention that after dividing the network in zones, vertices which were located outside of a zone boundary but were still connected to links directed towards the inside that zone, were kept as part of the zonal graph. After an extensive review of the existent transit indicators in the literature, two inconsistencies were found regarding the traditional connectivity indicators: complexity β and degree of connectivity γ, when applied for ranking various networks. The first issue was the inclusion of the multiple edges E M (which represent the presence of various routes in the same link) in the calculation of the connectivity indicators, as this could produce misleading results. For example, intuitively highly connected networks with a low number of multiple edges could have its connectivity underestimated, while networks with a high number of multiple edges and a low connectivity will end up having their connectivity overestimated. Thus, the need for a connectivity indicator which can account for the actual configuration (configuration being defined as the way the edges E are related to pair of vertices V ) of networks arises. The second issue, was that the γ and β indicators do not account for transit operational attributes, hence generating inadequate connectivity outcomes. Examples of operational factors include the frequency of routes, the speed of routes, the distance between vertices and the capacity versus demand of links, among many others. For example, networks with more frequent routes can be considered as more connected. Thus, two connectivity indicators which can account adequately for configuration, as well as consider transit operational factors were developed. Due to the limitations of data availability for this research only the AM peak frequency of bus routes was included to account for the bus operational attributes. The proposed indicators were an adaptation of the traditional γ and β indexes and were symbolized γ and β . The new indicators consisted of a 123 5.1. Summary product of two variables: 1) an operational component: the number of edges normalized by the frequency , which can be understood as a weighted average of the number of edges (both single and multiple) based on bus route frequencies and 2) a configuration component which consists of traditional connectivity indicators (either γ or β ) estimated using the number of single edges E S as input. These indicators measure the actual configuration of the networks. The current transit network indicators (see the list of indicators in Table 4.1 ) were estimated for the GVRD metro system, as Vancouver transport network has not been estimated in previous studies characterizing various metros from around the world. The data point for the GVRD metro was added to three graphs made by Derrible and Kennedy [5] comparing metro networks around the world. The first graph or state graph explains in which phase of development or maturity the metro is at, according to the complexity β and the degree of connectivity γ indicators. The second graph called the form graph describes the spatial relationship between the transit and the built environment. This graph intends to classify metro networks as focused on providing good access mostly regionally (trips between suburbs and the city core), locally (trips inside the city core) or both. The third graph, the structure graph classifies networks between being oriented by directness, connectivity or both, based on the directness τ and the structural complexity ρ indicators. Based on the state graph the GVRD metro network is in the limit between Phase I and II, meaning that the system is somewhat recently created and still in the process of developing. From the form graph it can be seen that the GVRD metro network has a low number of stations but a somewhat long line length, thus it can be classified as “Regionally Accessible”, meaning that the network focuses mainly on providing trips between the suburbs and the city core. Moreover, the GVRD metro network presents the lowest structural connectivity of all the metro networks studied , and it also has low directness. Finally, from the structure graph the GVRD metro network can be classified as connectivity oriented. The current and proposed indicators were estimated for the GVRD bus system at the zonal level (using the the 577 TAZ). Two cases were considered: 1) networks considering only transit links and 2) networks considering route transfers using intermediate walking transfers between bus stops, and transit links. A summary table with all indicators considered is presented for case 1 and 2 in Tables 4.4 and 4.5 respectively. Looking more closely at the spatial distribution of the proposed con124 5.1. Summary nectivity indicators (γ and β ’) it can bee seen that in case 1 (only busto-bus connections) medium to high connectivity is found inside the CBD and in the west part of the region (basically in the Vancouver municipality). Medium to low connectivity values start appearing gradually while approaching the surrounding suburban areas. On the other hand, in case 2 (bus-to-bus and bus-walking-bus connections) mostly high connectivity can be found inside the CBD and the west and the north-east part of the region (basically the municipalities of Vancouver, Richmond and Burnaby). While medium to low connectivity appears in the surrounding suburban areas progressively. Moreover, when comparing case 1 and 2 the following can be concluded: 1) connectivity values in case 2 are always higher than in case 1 for all proposed indicators, 2) high values of connectivity spread over more TAZs in case 2 than in case 1 and 3) connectivity decreases gradually while moving far away from the CBD and the west part of the region (the municipality of Vancouver), while reaching the suburban areas. The zonal transit indicators obtained to characterize the GVRD bus system were included as explanatory variables in ridership and safety models. First, in the case of ridership models only five transit indicators presented a positive linear relationship with ridership: local index of transit availability LITA, coverage σ , structural connectivity ρ , proposed degree of connectivity γ and proposed complexity β . Based on the estimated values of goodness of fit (R2 and t-stat) the most influential indicators are the degree of connectivity γ and the LITA. Followed by the complexity β , structural connectivity ρ and coverageσ . Seven multiple linear regressions were developed (see Equations 4.2.2 to 4.2.2), they all suggest that the five indicators have an impact over ridership and that they should be maximized while performing transit network design . The goodness of fit values obtained from the models (R2 between 0.61 to 0.69) are acceptable and suggest a zonal level of aggregation might not the best way to capture the impact of the transit indicators over transit ridership. The values of ridership and transit indicators present a very high variability among all the different TAZs making it difficult to obtain a well fitted model. Additionally, the value of the model intercept in the seven regressions is somewhat high, showing that they variability due to other factors which affect ridership but that were not included as variables in the models are being captured in the model intercept. The second research objective was to develop empirical tools useful for transit-oriented safety planning. The conclusions reached include: 125 5.1. Summary 36 collision prediction models were developed to explain total collisions (vehicle and transit) for a period of three years, only for the urban land use type (485 zones, out of 577 zones). A Generalized Linear Regression (GLM) technique was used assuming a Negative Binomial error structure. The models were developed using three different variables as the main exposure variable: vehicle and transit kilometers traveled V KT , vehicle kilometers traveled V HKT and total lane kilometers traveled T LKM. Additionally, explanatory variables were split into four themes: 1) transit infrastructure, which refers to bus system elements physically present in the roads. Variables in this category could be bus stop density, number of near sided, far sided or mid block stations, among others 2) transit network topology, which includes variables counting vertices, edges or loops (connectivity). 3) transit route design, which includes elements of the network design which can influence system performance, such as number of routes, frequency, coverage, connectivity, among others and 4) transit performance and operations; performance can be measured as quality of the service provided, usage of the service, among others. Operations indicators refer to transit facility management elements which can improve the system performance. Examples of operation indicators include bus route frequencies and service time span, among others. Tables 4.9, 4.10 and 4.11 present models that predict the total collisions (auto and transit) and their goodness of fit summary statistics. The models revealed that increased collisions are positively associated with the main exposure variables (V KT , V HKT or T LKM). Moreover, the main exposure variables V KT and V HKT were the most influential variables among all models. While, the total transit kilometers T LKM variable, was not the most influential exposure variable. The models showed that decreased collision occurrence was related with increases in the following explanatory variables: interstation spacing IS, average edge length and number of far sided FS and mid-block bus stops MB. Conversely, the models revealed that increased collisions were related to an increased number of stops NST OP , number of routes NR , bus stop density BDEN , coverage σ , overlapping degree OD , connectivity indicators (structural connectivity ρ , proposed degree of connectivity γ and proposed complexity β ), transit commuting trips per capita TCPC , AM peak route frequency fAM , service time span T and the local index of transit availability LITA . Despite this result, it cannot be concluded that improving the transit network (by adding bus stops, coverage, connectivity or to any of the other explanatory variables mentioned) 126 5.2. Contribution will lead to an increase in collision frequency. As improving transit will likely generate a reduction in auto usage and as a result a reduction in the main exposure variable (V KT and V HKT ). Although the percentage in which the transit variables listed change as exposure variables change (i.e elasticity) was not studied in this research; the expected outcome is that improving transit (by increasing the listed transit variables) will generate an overall positive effect on safety. Another important observation is that the model parameters for all indicators are higher in case 2 than in case 1. While, the leading constant term is slightly higher in case 1 than in case 2. The overall effect implies that models will predict a higher collision occurrence in case 2 than in case 1. Thus, ignoring walking transfers between bus stops as part of the transit network can underestimate the actual collision frequency. 5.2 Contribution The contributions of this research can be stated as follows: • A methodology to transform metro systems into networks, so that they could be analyzed with graph theory based indicators was developed by Derrible and Kennedy in 2008. An adaptation of the methodology, to redraw bus systems into graphs was proposed and developed in this study. The methodology accounted for bus specific characteristics such as the predominance of one way routes and walking transfers between bus stops, among others. Additionally since bus systems are more complex than metro systems, a technique to divide the overall bus network into smaller bus networks at the zonal level was also proposed. The zonal analysis of bus networks allowed for an understanding of the spatial distribution of transit indicators and the comparison of indicators among the zones. • There is an extensive list of transit indicators already developed with the support of graph theory. Indicators have been widely used to characterize and compare metro networks from the world, but not other types of transit (in particular surface systems). In this research however, current and proposed transit indicators were used to characterize a bus system . Moreover, transit indicators were estimated at the zonal level for the first time . 127 5.3. Recommendations for Future Research • Traditional connectivity graph theory based indicators (β and γ) were adapted to solve two issues: 1)the lack of inclusion of bus operational features (i.e. frequency of routes, speed, distance between vertices, capacity ) when assessing connectivity; 2) the inclusion of the number of multiple edges in the connectivity calculations can lead to misleading results when ranking various networks. Thus an adaptation of γ and β was attempted, in order to offer indicators which capture the effect of multiple edges appropriately, and include bus operational attributes. Due to the limitations of data availability for this research, only the AM peak frequency of bus routes was included to account for the bus operational attributes. The proposed indicators consisted of the product of two variables: 1) the weighted average of the number of edges based on bus route frequency, measuring bus operational features and 2) a traditional connectivity indicator (either γ or β ) estimated using the number of single edges as input, which measures the configuration. • Transit indicators were included as explanatory variables in the rid- ership and safety models. Multiple linear regressions for the case of ridership were developed and, collision prediction models using a Generalized Linear Regression (GLM) technique and following a Negative Binomial error structure were developed. 5.3 Recommendations for Future Research In addition to the proposed research contributions, various topics can be recommended for future research in transit and safety planning. Some of these topics are listed below. Transit Indicators Development The new transit indicators developed appeared to generate a better assessment of connectivity. The proposed connectivity indicators developed were adapted to account for bus operational characteristics. However, due to data and time availability for this research, only the AM peak frequency was included as an operational attribute. It would be interesting to explain connectivity using other transit operational variables such as: distance between nodes, capacity and speed, among others. Moreover, network configuration (network links being supplied) could be compared with the origin and 128 5.3. Recommendations for Future Research destination (OD) matrix (traveling demand) as a measure of connectivity. As mentioned before, two connectivity indicators were proposed in this study:γ and β . The indicators were estimated for the GVRD bus system at a zonal level, which was a somewhat complicated task. However, a simpler but fascinating application would be to apply the proposed indicators to characterize and compare the several bus networks of the world. It would be very useful to compare traditional connectivity indicators with these new indicators and produce additional guidelines to improve bus network design. Transit Indicators Applied to Ridership and Collision Prediction Models CPMs The application of the already developed ridership model and CPMs, can allow for the evaluation of transit demand and road safety of regional transportation plans in the GVRD. The goal will be to test the models produced in this study and draft the guidelines for their usage by practitioners. The application of the ridership and safety models can perhaps facilitate decision making for transit and safety planning. In this research the relationship between total collisions (transit and vehicle) and transit indicators was studied. However transit only collision data was not available. Therefore, it would also be interesting to analyze the influence of transit indicators in transit only collisions. Another idea for further research would be to refine the CPMs produced so far by including spatial effects. According to Wang and Abel-Aty [27] and El-Basyouni and Sayed [6] the use of basic modeling with spatially correlated data can produce biased estimators and invalid parameters. Moreover, considering spatial effects improves the models goodness of fit and helps explain model covariates which would not be considered significant or influential otherwise. An extensive list of transit indicators were used as explanatory variables for both ridership and safety models. Nevertheless there are still some indicators which were not estimated and that may also be associated with ridership and/or safety. For example, it would be worthy to study the effect of directness (see section 2.1.6), network diameter, node range of influence. Specifically it will be very interesting to analyze the influence of the directness-indicator as this indicator is positively correlated with ridership according to Derrible and Kennedy [5] and intuitively it will be inversely related to collision frequency. Thus, focusing on maximizing directness 129 5.3. Recommendations for Future Research while doing transit network design might be a possible way to help reduce collisions. 130 Bibliography [1] W.R. Black. Transportation : a geographical analysis. Guilford Press, New York, NY, 2003. [2] Census Canada. Statistics from the 1996 Census. Goverment of Canada, Ottawa, Canada, 1996. [3] Shalaby A. Cheung, C., B. Persaud, and A. Hadayery. Models for safety analysis road surface transit. Transportation Research Record, (2063):168– 175, 2008. [4] P. DeLeur and T. Sayed. A framework to proactivelly consider safety within the road planning process. Canadian Journal of Civil Engineering, (30):711– 719, 2003. [5] S. Derrible and C Kennedy. Characterizing metro networks: State, form, and structure. Transportation, 37(2):275–297, 2010. [6] K. El-Basyouny and T. Sayed. Urban arterial accident prediction models with spatial effects. Transportation Research Record, (2102):27–33, 2009. [7] W.L. Garrison and D.F. Marble. The structure of transportation networks. Center Northwestern University Evanston, IL, 1962. [8] D. Gattuso and E Mirello. Compared analysis of metro networks supported by graph theory. Networks and Spatial Economics, 5(4):395–414, 2005. [9] Shalaby A. Hadayeghi, A. and B. Persaud. Macro-level accident prediction models for evaluating safety of urban transportation systems. Presented at Transportation Research Record 2003 Annual Meeting, January, Washington D.C. [10] Ng J.C.N. Hauer, E. and J. Lovell. Estimation of safety at signalized intersections. Transportation Research Record, (1185):48–61, 1988. [11] G. Ho and M. Guarnaschelli. Developing a road safety module for the regional transportation model, technical memorandum one framework. ICBC, 1998. 131 Bibliography [12] P. Jovanis. Analysis of bus tranist accidents: Empirical, methodological and policiy issues. Transportation Research Record, (1322):17–28, 1991. [13] K.J. Kansky. Structure of Transportation Networks: Relationships Between Network Geometry and Regional Characteristics. University of Chicago Press, Chicago, IL, 1963. [14] R. Kulmala. Safety at rural three and four arm junctions: Development and Application of Accident Prediction Models. Dissertation for the degree of Doctor of Technology, Technical Research Centre of Finland, VTT Publication 233, 1995. [15] Washington S. Ladron de Guevara, F. and J. Oh. Forecasting crashes at the planning level: A simultaneous negative binomial crash model applied in tucson, arizona. Presented at Transportation Research Record 2004 Annual Meeting, January, Washington D.C. [16] F. Lin and F. Navin. Errors in transportation planning. Unpublished Research Paper, Civil Engineering Department. University of British Columbia. Vancouver, Canada. [17] G. R. Lovegrove and T. Sayed. Macro-level collision prediction models for evaluating neighbourhood traffic safety. Canadian Journal of Civil Engineering, (33):609–620, 2006. [18] Hariwal R. Meakes, A. and W. . Tan. Vancouver transit accesibility: Frequency, capacity and service. Unpublished Reserch Paper, Geography Department. University of British Columbia. Vancouver, Canada, 2010. [19] S. Miaou and H. Lum. Modelling vehicle accident and highway geometric design relationships. Accident Analysis and Prevention Elsevier Ltd, pages 689–709, 1993. [20] A. Musso and V.R. Vuchic. Characteristics of metro networks and methodology for their evaluation. Transportation Research Record, (1162):22–33, 1985. [21] United Nations. World Urbanization Prospects: The 2007 Revision. United Nations, New York, USA, 2008. [22] Barrington-Craggs R. Smith D. Pennycook, F. and S. Bullock. Mapping transport and social exclusion in bradford. http://www.geog.ubc.ca/courses/geog471/classof07/transit/Intro.htmld, 2001. 132 Bibliography [23] Lord-D. Persaud, B.N. and J. Palmisano. Calibration and transferability of accident prediction models for urban intersections. Transportation Research Record, (1784):57–64, 2002. [24] L.F. Rodriguez. Accident Prediction Models for Unsignalized Intersections. Dissertation for the degree of Master of Applied Science, 1998. [25] T. Rood. The local index of transit availability, an implementation manual. Local Government Commission, 1998. [26] Z. Sawalha and T. Sayed. Evaluation of safety of urban arterial roadways. Journal of Transportation Engineering, (127):151–158, 2001. [27] X. Wang and M. Abdel-Aty. Temporal and spatial analyses of rear-end crashes at signalized intersections. Elsevier, (38):1137–1150, 2006. [28] Maoh-H. Wiley, K. and P. Kanaroglou. Improved methods to explore and model the level of service of urban sercive of urban public transit. Unpublished Research Paper, School of Geography and Earth Sciences. McMaster University. Hamilton, Canada, 2010. 133
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Graph theory based transit indicators applied to ridership...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Graph theory based transit indicators applied to ridership and safety models Quintero- Cano, Liliana 2011
pdf
Page Metadata
Item Metadata
Title | Graph theory based transit indicators applied to ridership and safety models |
Creator |
Quintero- Cano, Liliana |
Publisher | University of British Columbia |
Date | 2011 |
Date Issued | 2011-10-21 |
Description | Public transportation systems are a fundamental necessity in current times where sustainability and rising safety costs are important concerns to government officials and the general public. Therefore, the design of public transportation systems is an area of great interest for researchers and practitioners. Nonetheless, there is usually little analysis of network properties during transit design and planning. Moreover, due to the lack of empirical tools, there is not much consideration of transit safety at the planning stage . In this research, a study was performed to explore zonal based network properties applied to bus systems. A new technique to measure network connectivity was developed and applied to a real-world transit system, which in addition to the relationship between edges and vertices, incorporated the influence of transit operational factors (i.e. frequency of routes). Additionally, the effect of bus route transfers was analyzed and modeled by adding intermediate walking transfer links between bus stops. The calculated network properties were applied as explanatory variables in the development of macro-level ridership and collision prediction models. The proposed methodology was applied to the Greater Vancouver Regional District (GVRD) public transportation system and its 577 traffic analysis zones. The developed mathematical models include, seven multiple linear regression models which explain transit commuting ridership. The regression models revealed that ridership is positively linked to network characteristics such as coverage, connectivity, complexity and, the local index of transit availability (LITA). In addition, 35 collision prediction models were developed using a Generalized Linear Regression technique, assuming a Negative Binomial error structure. The safety models showed that increased collisions were associated with transit network properties such as: connectivity, coverage, overlapping degree and the LITA. As well, the models revealed a positive relation between collisions and transit physical and operational attributes such as number of routes, frequency of routes, bus density, length of bus route and 3+ priority lanes, among others. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Electronic Theses and Dissertations (ETDs) 2008+ |
Date Available | 2011-10-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution 3.0 Unported |
DOI | 10.14288/1.0050719 |
Degree |
Master of Applied Science - MASc |
Program |
Civil Engineering |
Affiliation |
Applied Science, Faculty of Civil Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2011-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by/3.0/ |
URI | http://hdl.handle.net/2429/38152 |
Aggregated Source Repository | DSpace |
Download
- Media
- ubc_2011_fall_quintero_liliana.pdf [ 8.33MB ]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0050719.json
- JSON-LD: 1.0050719+ld.json
- RDF/XML (Pretty): 1.0050719.xml
- RDF/JSON: 1.0050719+rdf.json
- Turtle: 1.0050719+rdf-turtle.txt
- N-Triples: 1.0050719+rdf-ntriples.txt
- Original Record: 1.0050719 +original-record.json
- Full Text
- 1.0050719.txt
- Citation
- 1.0050719.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
United States | 26 | 1 |
China | 17 | 3 |
Canada | 7 | 9 |
Japan | 7 | 0 |
Ukraine | 6 | 0 |
Russia | 4 | 0 |
India | 4 | 1 |
Brazil | 2 | 0 |
France | 2 | 0 |
Australia | 1 | 0 |
Netherlands | 1 | 0 |
City | Views | Downloads |
---|---|---|
Unknown | 21 | 179 |
Ashburn | 11 | 0 |
Tokyo | 7 | 0 |
Shanghai | 6 | 0 |
Rivne | 6 | 0 |
Beijing | 5 | 0 |
Vancouver | 4 | 9 |
Shenzhen | 3 | 2 |
Changchun | 2 | 1 |
Washington | 2 | 0 |
Delft | 1 | 0 |
Montreal | 1 | 0 |
Santa Clara | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0050719/manifest