- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Accident prediction models for unsignalized intersections
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Accident prediction models for unsignalized intersections 1998
pdf
Page Metadata
Item Metadata
Title | Accident prediction models for unsignalized intersections |
Creator |
Rodríguez, Luis F. (Luis Felipe) |
Date Created | 2009-05-05T17:11:04Z |
Date Issued | 2009-05-05T17:11:04Z |
Date | 1998 |
Description | The main objective of this thesis is to develop Accident Prediction Models (APM) for estimating the safety potential of urban unsignalized (T and 4-leg) intersections in the Greater Vancouver Regional District (GVRD) and Vancouver Island on the basis of their traffic characteristics. The models are developed using the generalized linear regression modeling (GLIM) approach, which addresses and overcomes the shortcomings associated with the conventional linear regression approach. The safety predictions obtained from GLIM models can be refined using the Empirical Bayes' approach to provide, more accurate, site-specific safety estimates. The use of the complementary Empirical Bayes approach can significantly reduce the regression to the mean bias that is inherent in observed accident counts. The thesis made use of sample accident and traffic volume data corresponding to unsignalized (both T and 4-leg) intersections located in urban areas of the Greater Vancouver Regional District (GVRD) and Vancouver Island. The data included a total of 427 intersections located in the cities of Victoria, Surrey, Nanaimo, Coquitlam, Burnaby and Vancouver. The information available for each intersection included the total number of accidents in the 1993-1995 period, traffic volumes for both major and minor roads given in Average Annual Daily Traffic (AADT) and type of intersection (T or 4-leg). Four categories of models were developed in this study: (1) models for the total number of accidents; (2) separate models for T and 4-leg intersections; (3) separate models for different regions (Vancouver Island, the Lower Mainland and Surrey); and (4) a model for Surrey including intersection control. Five applications of APM were used in this thesis. Four of them relate to the use of the Empirical Bayes refinement: identification of accident-prone locations, developing critical accident frequency curves, ranking the identified accident-prone locations and before and after safety evaluation. The fifth application provides a safety-planning example, comparing the safety of a 4-leg intersection to two staggered T-intersections. These applications show the importance of implementing APM as a tool to assess in a reliable fashion traffic safety, and design different safety strategies. |
Extent | 4113956 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
Date Available | 2009-05-05T17:11:04Z |
DOI | 10.14288/1.0050154 |
Degree |
Master of Applied Science - MASc |
Program |
Civil Engineering |
Affiliation |
Applied Science, Faculty of |
Degree Grantor | University of British Columbia |
Graduation Date | 1998-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/7882 |
Aggregated Source Repository | DSpace |
Digital Resource Original Record | https://open.library.ubc.ca/collections/831/items/1.0050154/source |
Download
- Media
- ubc_1998-0177.pdf [ 3.92MB ]
- Metadata
- JSON: 1.0050154.json
- JSON-LD: 1.0050154+ld.json
- RDF/XML (Pretty): 1.0050154.xml
- RDF/JSON: 1.0050154+rdf.json
- Turtle: 1.0050154+rdf-turtle.txt
- N-Triples: 1.0050154+rdf-ntriples.txt
- Citation
- 1.0050154.ris
Full Text
ACCIDENT PREDICTION MODELS FOR UNSIGNALIZED INTERSECTIONS by L U I S F E L I P E R O D R I G U E Z B.Sc. (Civil Engineering), Universidad de los Andes, Bogota, Colombia, 1990 M.Sc. (Civil Engineering), Universidad de los Andes, Bogota, Colombia, 1992 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE F A C U L T Y OF G R A D U A T E STUDIES DEPARTMENT OF CIVIL ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA APRIL, 1998 © Luis Felipe Rodriguez, 1998 In presenting this thesis in partial fulfillment of the requirements for an advanced degree at The University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that the permission for the extensive copying of this thesis for scholarly purpose may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Civi l Engineering The University of British Columbia 2324 Main Mall Vancouver, B.C. Canada, V6T 1Z4 A B S T R A C T The main objective of this thesis is to develop Accident Prediction Models (APM) for estimating the safety potential of urban unsignalized (T and 4-leg) intersections in the Greater Vancouver Regional District (GVRD) and Vancouver Island on the basis of their traffic characteristics. The models are developed using the generalized linear regression modeling (GLIM) approach, which addresses and overcomes the shortcomings associated with the conventional linear regression approach. The safety predictions obtained from GLIM models can be refined using the Empirical Bayes' approach to provide, more accurate, site-specific safety estimates. The use of the complementary Empirical Bayes approach can significantly reduce the regression to the mean bias that is inherent in observed accident counts. The thesis made use of sample accident and traffic volume data corresponding to unsignalized (both T and 4-leg) intersections located in urban areas of the Greater Vancouver Regional District (GVRD) and Vancouver Island. The data included a total of 427 intersections located in the cities of Victoria, Surrey, Nanaimo, Coquitlam, Burnaby and Vancouver. The information available for each intersection included the total number of accidents in the 1993-1995 period, traffic volumes for both major and minor roads given in Average Annual Daily Traffic (AADT) and type of intersection (T or 4-leg). Four categories of models were developed in this study: (1) models for the total number of accidents; (2) separate models for T and 4-leg intersections; (3) separate models for different regions (Vancouver Island, the Lower Mainland and Surrey); and (4) a model for Surrey including intersection control. Five applications of APM were used in this thesis. Four of them relate to the use of the Empirical Bayes refinement: identification of accident-prone locations, developing critical accident frequency curves, ranking the identified accident-prone locations and before and after safety evaluation. The fifth application provides a safety-planning example, comparing the safety of a 4-leg intersection to two staggered T-intersections. These applications show the importance of implementing APM as a tool to assess in a reliable fashion traffic safety, and design different safety strategies. n T A B L E O F C O N T E N T S ABSTRACT ii TABLE OF CONTENTS iii LIST OF TABLES vi LIST OF FIGURES vii ACKNOWLEDGEMENTS x CHAPTER I: INTRODUCTION 1 1.1 Background 1 1.2 Thesis Structure 2 CHAPTER II: LITERATURE REVIEW 4 2.0 Introduction 4 2.1 Shortcomings Associated with Conventional Linear Regression Models 4 2.2 Generalized Linear Models (GLIM) 6 2.3 Testing the Models Significance 10 2.4 Model Structure 12 2.5 Location Specific Prediction: The Empirical Bayes Refinement 13 2.6 Previous work 17 2.7 Conclusion 19 ii i CHAPTER III: DATA COLLECTION AND MODEL DEVELOPMENT 21 3.0 Introduction 21 3.1 Data Collection 21 3.1.1 Accident and Traffic Volume Data 21 3.1.2 Outlier Analysis 23 3.2 Model Development 26 3.2.1 Model for the Total Number of Accidents 27 3.2.2 Models for T and 4-leg Intersections 30 3.2.3 Regional Models 38 3.2.4 Effect of Intersection Control Type 44 3.3 Comparison with Previous Results 47 3.4 Conclusion 48 CHAPTER IV: STATISTICAL CONSIDERATIONS 50 4.0 Introduction 50 4.1 Poisson vs. Negative Binomial Distribution Error Structure 50 4.2 Approaches for Estimating the Negative Binomial Distribution Parameter K. 52 4.3 Conclusion 56 CHAPTER V: APPLICATIONS 57 5.0 Introduction 57 5.1 Empirical Bayes Refinement 57 5.2 Identification of Accident Prone Locations 61 5.3 Critical Accident Frequency Curves 64 iv 5.4 Ranking of Accident Prone Locations 74 5.5 Before and After Studies 77 5.6 Safety Comparison of Staggered T and 4-leg Intersections 78 5.7 Conclusion 81 CHAPTER VI: CONCLUSIONS AND RECOMMENDATIONS 82 6.1 Conclusions 82 6.2 Recommendations for further research 84 BIBLIOGRAPHY 86 APPENDIX I: R E S U L T S O F O U T L I E R S I D E N T I F I C A T I O N 90 APPENDIX II: P R E D I C T I O N R A T I O V S . A C C I D E N T F R E Q U E N C Y 97 APPENDIX III: G L I M S E S S I O N F O R E S T I M A T I N G A P M 101 APPENDIX IV: G L I M S E S S I O N F O R D I F F E R E N T N E G A T I V E B I N O M I A L M E T H O D S 108 L I S T O F T A B L E S Table 3.1 Summary of Accident, Intersection Control and Traffic Volume Data 22 Table 3.2 Statistical Summary of Accidents 22 Table 3.3 Identification of Outliers for Total Model 26 Table 3.4 Model for the Total Number of Accidents 28 Table 3.5 Models for T and 4-leg Intersections 30 Table 3.6 Total Model Including the Effect of Intersection Type 34 Table 3.7 Intersection Type Model: 2 Separate Models vs Single Model 37 Table 3.8 Regional Models 39 Table 3.9 Surrey Total Model with Control Type 45 Table 4.1 Comparison between Poisson and Negative Binomial Distribution 51 Table 4.2 Results of Different Negative Binomial Methods in the Total Model 54 Table 5.1 Number of Accident Prone Locations 65 Table 5.2 Ranking of APLs for The Vancouver Island Model 75 Table 6.1 Summary of Accident Prediction Models 83 Table AI-1 Identification of Outliers for Total Model with Intersection Type 90 Table AI-2 Identification of Outliers for T-Intersection Model 91 Table AI-3 Identification of Outliers for 4-Leg Intersection Model 92 Table AI-4 Identification of Outliers for Vancouver Island Model 93 Table AI-5 Identification of Outliers for Lower Mainland Model 94 Table AI-6 Identification of Outliers for Surrey Total Model 95 Table AI-7 Identification of Outliers for Surrey Total Model with Intersection Control Type.... 96 vi L I S T O F F I G U R E S Figure 2.1 Empirical Bayes' Estimate for Different K Values v 16 Figure 3.1 Identification of the Highest Cook's Distance Values for Total Model 26 Figure 3.2 Total Model: Observed vs. Predicted Number of Accidents 29 Figure 3.3 Total Model: Predicted Accidents vs Estimated Variance 29 Figure 3.4 T-Intersection Model: Observed vs. Predicted Number of Accidents 32 Figure 3.5 T-Intersection Model: Predicted Accidents vs Estimated Variance 32 Figure 3.6 4-leg Intersection Model: Observed vs. Predicted Number of Accidents 33 Figure 3.7 4-leg Intersection Model: Predicted Accidents vs Estimated Variance 33 Figure 3.8 Total Model with the Effect of Intersection Type: Observed vs. Predicted Number of Accidents 36 Figure 3.9 Total Model with the Effect of Intersection Type: Predicted Accidents vs Estimated Variance 36 Figure 3.10 Intersection Type Model: Separate vs. Single Models 38 Figure 3.11 Vancouver Island Model: Observed vs Predicted Number of Accidents 41 Figure 3.12 Vancouver Island Model: Predicted Accidents vs Estimated Variance 41 Figure 3.13 Lower Mainland Model: Observed vs Predicted Number of Accidents 42 Figure 3.14 Lower Mainland Model: Predicted Accidents vs Estimated Variance 42 Figure 3.15 Surrey Total Model: Observed vs Predicted Number of Accidents 43 Figure 3.16 Surrey Total Model: Predicted Accidents vs Estimated Variance 43 Figure 3.17 Comparison of Total Model with Regional Models 44 vii Figure 3.18 Surrey Total Model with Control Type: Observed vs Predicted Number of Accidents 46 Figure 3.19 Surrey Total Model with Control Type: Predicted Accidents vs Estimated Variance 46 Figure 3.20 Comparison of T-Intersection model with Previous Studies 48 Figure 4.1 Predicted Accidents using Different Methods to Obtain K. 56 Figure 5.1 Predicted vs. EB Refined Number of Accidents for Total Model 60 Figure 5.2 Identification of Accident Prone Locations 64 Figure 5.3 Critical Curves for Total Model 66 Figure 5.4 Critical Curves for Vancouver Island Model 67 Figure 5.5 Critical Curves for Lower Mainland Model 68 Figure 5.6 Critical Curves for Different Values of K 70 Figure 5.7 Comparison of Critical Accidents for Different K Values 73 Figure 5.8 Ranking of Top 10 A P L for Island Model 76 Figure 5.9 Staggered T vs 4-leg Intersections Safety Comparison 80 Figure AI-1 Identification of the Highest Cook's Distance Values for Total Model with Intersection Type 90 Figure AI-2 Identification of the Highest Cook's Distance Values for T-Intersection Model 91 Figure AI-3 Identification of the Highest Cook's Distance Values for 4-leg Intersection Model 92 Figure AI-4 Identification of the Highest Cook's Distance Values for Vancouver Island Model 93 Figure AI-5 Identification of the Highest Cook's Distance Values for Lower Mainland Model. 94 Figure AI-6 Identification of the Highest Cook's Distance Values for Surrey Total Model 95 Figure AI-7 Identification of the Highest Cook's Distance Values for Surrey Total Model with Intersection Control Type 96 Figure AII-1 A R vs. Accident Frequency for Total Model 97 Figure AII-2 A R vs. Accident Frequency for T-intersection Model 97 Figure AII-3 A R vs. Accident Frequency for 4-leg intersection Model 98 Figure AII-4 A R vs. Accident Frequency for Total Model Including Intersection Type 98 Figure AII-5 A R vs. Accident Frequency for Vancouver Island Model 99 Figure AII-6 A R vs. Accident Frequency for Lower Mainland Model 99 Figure AII-7 A R vs. Accident Frequency for Surrey Total Model 100 Figure AII-8 A R vs. Accident Frequency for Surrey Model with Control Type 100 ix A C K N O W L E D G E M E N T S This thesis represents not only my last step toward an M.A.Sc. Degree, but also represents the fulfillment of a dream that I have had for many years, to obtain a graduate degree from a university outside my country. Many people helped me achieve this goal, which has been the most important achievement in my professional career. This has been a real teamwork comprised by my family, friends and the faculty staff, who in different ways supported me during these last two years. First of all I would like to thank my supervisor, Dr. Tarek Sayed, for all the time he spent in explaining to me all the ideas and theory behind the phenomenon of traffic accident occurrence, and for encouraging me to keep the enthusiasm during the different stages of this research. I am also very grateful to him for having given me the opportunity to work as a research assistant in two of his projects. I want to thank the Insurance Corporation of British Columbia (ICBC) for providing the financial support to this research and Delcan Corporation for providing the accident data used to develop the accident prediction models. Without them completing this thesis would have been impossible. I am also grateful to the Instituto Colombiano de Fomento para Estudios Tecnicos en el Exterior, ICETEX, for its financial support during my first year in Canada. I also thank my cousin Felipe Hernandez for the whole week that he spent in correcting my English and even sometimes my Spanish. Last but not least, I want to dedicate this work and milestone to the persons that have given me the greatest support in my life, my father Luis and my mother Cecilia. Thank you guys for showing me the way to get where I am now. I have learned from you that despite all difficulties and sometimes frustrations, there are no impossible goals to reach. I feel very happy to inform you that "lo logre!". x C H A P T E R I I N T R O D U C T I O N 1.1 Background Since the dawn of the automobile age about a century ago, traffic safety problems have been a serious concern: an enormous economic and human toll has been exacted as a result of the public's ongoing love affair with the motor vehicle. It is commonly accepted that there are many costs associated with vehicular mobility such as air pollution, noise, and accidents. However, the economic and social costs associated with road accidents greatly exceed other mobility costs due to the loss of property, injury, pain, grief and deaths attributed to road accidents. In British Columbia, 500 people are killed and 50,000 injured as a result of road accidents. The annual direct claim costs for the Insurance Corporation of British Columbia (ICBC) due to road accidents are estimated to exceed $2 billion (ICBC, 1996 Annual Report), themselves far exceeded by their related social costs. Consequently, the importance of reducing the social and economic costs of road accidents can not be overstated. Recognizing the traffic safety problem and the importance of reducing the frequency and severity of road accidents, the majority of road authorities have established Road Safety Improvement Programs (RSIPs). The objective of these programs is to identify accident-prone locations, determine possible causes and countermeasures, and to implement the most effective countermeasures in order to alleviate the problems at these locations. The success of these RSIPs can be enhanced by developing statistically reliable accident prediction models, which provide 1 Chapter I: Introduction accurate estimates for the traffic safety at road sections and intersections. These safety estimates can be used in identifying accident prone locations and evaluating the effectiveness of remedial measures. The main objective of this thesis is to develop accident prediction models for estimating the safety potential of urban unsignalized intersections as functions of traffic volumes on both major and minor roads, and type of intersection (T and 4-leg). The data used for this thesis included accident records and traffic volume data for intersections located in the G V R D and urban areas of Vancouver Island. The methodology used to derive these models is based on the Generalized Linear Regression Models (GLIM) approach. The GLIM approach addresses and overcomes the problems associated with conventional linear regression. Several researchers have shown that conventional linear regression lack the distributional property to describe the occurrence of accidents. Some of the potential applications of accident prediction models include: Identifying and ranking accident prone locations, before and after safety evaluation, and safety planning. The work reported in this thesis is part of the ongoing research at the Civi l Engineering Department of the University of British Columbia on accident prediction models. Models have been developed for urban signalized intersections (Feng and Sayed, 1997). Currently, models are being developed for rural signalized intersections, urban and rural corridors. 1.2 Thesis Structure This thesis is divided into six chapters. Chapter One provides an overview of the thesis and its structure. Chapter Two summarizes previous work on accident prediction models, the theoretical 2 Chapter I: Introduction background of the GL IM approach, and its applications to accident prediction models. Chapter Three describes with the accident and traffic volume data used and the models developed. Chapter Four discusses several statistical issues related to the GL IM approach. Chapter Five discusses several applications of the models. The applications include: identification of accident prone locations; developing critical frequency curves, ranking of accident prone locations; before-and-after safety evaluation; and the use of the models in safety planning. Chapter Six provides suggestions for follow up work and the summary and conclusion of the thesis. 3 C H A P T E R II L I T E R A T U R E R E V I E W 2.0 Introduction The relationship between traffic accidents and traffic volumes has been the subject of numerous studies. Most of the earlier studies used the conventional linear regression approach to develop models relating accidents to traffic volumes. However, the past decade has seen a significant development and advances in accident data analysis and modeling. Accident prediction models are no longer limited to conventional linear regression approach, as more accurate and less restrictive nonlinear models are considered. In addition, the use of Empirical Bayes' approach for refining the estimates obtained from accident prediction models has also been an important development. This chapter describes the statistical theory behind the accident prediction models, as well as previous research and developments. 2.1 Shortcomings Associated with Conventional Linear Regression Models The conventional linear regression model is defined as follows: k Yi =a0+Hajxij + £i where, 7, = estimated or dependent variable a0, a-, = estimated coefficients Xy = independent variables 4 Chapter II: Literature Review Si = estimated error, assumed to be normally distributed Several researchers (Jovanis and Chang, 1986, Saccomanno and Buyco, 1988, Miaou and Lum, 1993) have shown that conventional linear regression models lack the distributional property to adequately describe random, discrete, non-negative, and typically sporadic events which are all characteristics of traffic accidents. Jovanis and Chang (1986) identified three shortcomings associated with the assumption of a normal distribution error structure. The first shortcoming is found in the relationship between the mean and the variance of accident frequency. Jovanis and Chang (1986) demonstrated that as volume of traffic increases, so does the variance of accident frequency. Under a normal distribution assumption, the variance remains constant. The second shortcoming is associated with the non-negativity of accident occurrence. Predicted negative values under conventional linear models might occur when there exists low accident frequencies in the data set. A way to avoid this problem is by using non-linear models, which are linearized in a logarithm fashion in order to estimate their parameters. The third problem is related with the non-normality of the error distribution, due to the characteristics of non-negativity and small value of discrete dependent variable. Jovanis and Chang (1986) found that the best way to overcome these problems is to assume a Poisson distribution error structure. Their results were demonstrated by modeling accidents at highway sections in Indiana. Miaou and Lum (1993) identified the same shortcomings when performing a comparison between four accident prediction models applied to trucks on highways. Two of these models 5 Chapter II: Literature Review were developed under the assumption of normal distribution error structure, while the others were assumed to be Poisson distributed. It was found that predicted values, from models using the Poisson distribution assumption were much closer to the observed values and its estimated coefficients had higher t-statistics, which denote higher significance. For the normally distributed models, it was found that some of the estimated coefficients had signs contrary to the expectation. These results confirmed all the shortcomings associated with the conventional linear regression technique and its applicability for developing accident prediction models. 2.2 Generalized Linear Models (GLIM) As seen in the previous section, GL IM has the advantage of overcoming all the shortcomings associated with the conventional linear regression approaches. As well, GL IM has the flexibility of assuming different error distributions and link functions that allow the conversion of non- linear models into linear models. Recognizing the advantages of the GLIM approach, it will be utilized in this thesis. The GL IM approach used herein is based on the work of Kulmala (1995) and Hauer et-al, (1988). Assuming that Y is a random variable that describes the number of accidents at an intersection in a specific time period, and y is the observation of this variable during a period of time. The mean of Y is A which can also be regarded as a random variable. Then for A=A, Y is Poisson distributed with parameter X: Xye -I ; E(Y\A = 4 Var(Y\A = x) = X P(Y = y\A = X) (2.2) 6 Chapter II: Literature Review Since each site has its own regional characteristics with a unique mean accident frequency A, Hauer et-al, (1988) have shown that for an imaginary group of sites with similar characteristics, A follows a gamma distribution (with parameters K and K/JJ), where K is the shape parameter of the distribution. That is: r(K) with a mean and variance of: .2 E(A) = / / ; Var(A) = ^— (2.4) K Kulmala (1995) has also shown that the point probability function of Y based on equations (2.3) and (2.4) is given by the negative binomial distribution: P(Y = y): r{K)y\ K \tc + juj (2.5) with an expected value and variance of: E(Y) = ju; Var(Y) = /u-tL K (2.6) As shown in equation (2.6), the variance of observed accidents for the entire sample has two sources: the second term (JJ/K) from the variance of the predicted number of accidents, and the first term (JJ) from the variation of the number of accidents (Kulmala, 1995). Notice that when 7 Chapter II: Literature Review k—>co, the variance of equation (2.6) equals the mean, which is identical to the Poisson distribution. As described earlier, for the GL IM approach, the error structure that best fits the accident occurrence is usually assumed to be Poisson or negative binomial. The main advantage of the Poisson error structure is the simplicity of the calculations, because the mean and variance are equal and its method for calculation is readily included in the GL IM software package (NAG, 1994). However, this advantage is also a limitation. It has been shown (Kulmala and Roine, 1988, and Kulmala, 1995) that most accident data is likely to be overdispersed (the variance is greater than the mean) which indicate that the negative binomial distribution is the more realistic assumption. Miaou and Lum (1993) identified three possible sources of overdispersion in accident data. The first is related to omitted variables that explain accident occurrence. Traffic accidents depend on numerous variables including geometric characteristics, weather, time of day, and human factors. Many of these variables are not discernible from accident records. The second possible source of overdispersion is related to uncertainties in vehicle exposure data, derived from error during collection of data. The third source comes from non-homogeneous roadway environments, which can explain why accident rates are different during daylight and night times or during rainy versus sunny days. The main difficulty associated with using the negative binomial distribution error structure is the determination of the shape parameter k. Kulmala (1995) proposed an iterative approach using the 8 Chapter II: Literature Review method of moments. The GLIM software package (V 4.0) includes a macro library in which the parameter k is calculated by three different iterative methods: the maximum likelihood, the mean deviance estimate, and the mean %2 estimate (NAG, 1996). A comparison between the four methods wil l be provided in Chapter 4. The method of the maximum likelihood is used in this thesis. Bonneson and McCoy (1993) proposed a methodology to decide whether to use a Poisson or negative binomial error structure. First, the model parameters are estimated based on a Poisson distribution error structure. Secondly, a dispersion parameter (o~d) is calculated. The dispersion parameter is defined as: Pearson %2 crd = (2.7) n — p where n is the number of observations and p is the number of model parameters (The Pearson % test wil l be described in detail in next section). If a d is greater than 1.0, then the data have greater dispersion than is explained by the Poisson distribution, and a further analysis using a negative binomial distribution is required. If o~d is near 1.0, then the assumed error structure approximately fits the Poisson distribution. This method has the advantage of testing the model under the Poisson distribution first, which is easier to estimate than the negative binomial distribution. 9 Chapter II: Literature Review 2.3 Testing the Models Significance The significance of GL IM models is usually assessed using the Scaled Deviance (SD) and the Pearson test. The SD is defined as the likelihood test ratios measuring the difference between the log likelihood of the studied model, and the saturated model (Kulmala, 1995). The general equation for SD is defined as follows: SD = 21og/(y,y)-21og/(E(yl) ,y) (2.8) where log/(E(A),y) is the natural logarithm for the probability density function. Mc Cullagh and Nelder (1983) have shown that for the Poisson the SD is defined as: f \ (2-9) and for the negative binomial distribution the SD is defined as: i=l ( , A yt + K (2.10) The scaled deviance is asymptotically distributed with n-p-1 degrees of freedom. Therefore, for a well-fitted model with appropriate link function, error distribution and functional form, the expected value of SD will approximately equal the number of degrees of freedom (Maycock and Hall, 1984) 10 Chapter II: Literature Review Another measure to assess the significance of the GL IM models is the Pearson statistic defined as (Bonneson and McCoy, 1993): where yt is the observed number of accidents at intersection i, E(A) is the predicted number of accidents obtained from the accident prediction model, and Var(y) is the variance of the observed accidents defined in equation (2.2) and (2.6) for Poisson and negative binomial distributions, respectively. The Pearson statistic follows the distribution with n-p-1 degrees of freedom, where n is the number of observations, and p is the number of model parameters. In addition, useful subjective measures of the model goodness of fit are graphical methods. One of them is to plot the predicted accident frequency versus the observed accident frequency. A well fitted model should have all points in the graph clustered symmetrically around the 45° line. A second graphical method is to plot the average of squared residuals versus the predicted accident frequency. For a well fitted model, all points should be around the variance function line as defined in equation (2.6) for the negative binomial distribution. Another graphical method is to calculate the Prediction Ratio (PR) and plot it against the predicted values. PR is defined as the normalized residual, which is the difference between the predicted and observed accidents, divided by the standard deviation (Bonneson and McCoy, 1997). PR can be calculated according to the following equation: (2.11) 11 Chapter II: Literature Review For a well fitted model Pi?,- should be clustered around the zero axis in a Predicted Accidents vs. PR graph. Finally, the T-ratio test is used to measure the statistical significance of the variable coefficients. The t-ratio test is defined as the ratio between the estimated GL IM parameter and its standard error. For a significant variable at 95% level of confidence, the t-ratio should be greater than 1.96. A l l six tests described in this section were used to access the significance of the models developed for this thesis. 2.4 Model Structure Intersection accident prediction models can be generally classified into two types. The first type relates accidents to the sum of traffic flows entering the intersection, while the second relates accidents to the product of traffic flows entering the intersection. The latter type has been shown to be more suitable to represent the relationships between accidents and traffic flows at intersections (Hauer et-al, 1988). In this kind of structure, accident frequency is a function of the product of traffic flows raised to a specific power (usually less than one). This approach has been used in this thesis. That is: 12 Chapter II: Literature Review E(A) = a0 x V°l x Vj2 (2.13) where, E(A) predicted accident frequency V, major road traffic volume minor road traffic volume model parameters As mentioned earlier, accident occurrence is not a function of traffic flows only, but also other variables (e.g. weather, intersection type, geometric features, etc.). Kulmala (1995) and Maher and Surnmersgill (1996) proposed to model these additional variables along with traffic flows as follows: where Xj represents any of the m additional variables. 2.5 Location Specific Prediction: The Empirical Bayes Refinement There are two types of clues to the safety of a location: its traffic and road geometric design characteristics, and its historical accident data (Hauer, 1992, Briide and Larsson, 1988). The Empirical Bayes (EB) approach makes use of both clues. The EB approach is used to refine the estimate of the expected number of accidents at a location by combining the observed number of accidents at the location with the predicted number of accidents obtained from the GL IM model, to yield more accurate, location-specific safety estimate. m E(A) = a0 x x x ej (2.14) 13 Chapter II: Literature Review The EB estimated number of accidents for any intersection can be calculated by using the following equation (Hauer et-al, 1992): EBsafety estimate =ax E(A) + (1 - a) x count (2.15) where, 1 a Var(E(A)) (2-16) 1 + E(A) count = observed number of accidents E(A) = predicted number of accidents as estimated from the GL IM model Var(E(A)) = variance of the GL IM estimates Using the variance of the predicted accidents, Var(E(A)), defined in equation (2.4), equation (2.15) can be rearranged to yield: safety estimate xE(A)+ ^ - L - KK + E(A)) \K + E(A). x count (2.17) In addition, the variance of the EB refined estimate can be calculated using the following equation (Kulmala, 1995): Var(EBsaj-ety estimate) E(A) KK + E(A), ( EV A\ \ X K + E{A) KK + E(A)J x count (2-18) 14 Chapter II: Literature Review Equation (2.17) shows that the EB refined estimate lies between the observed and the predicted number of accidents, combining both the individual accident history of the location and the GL IM model prediction (Figure 2.1). The K parameter also plays an important role in the calculation of the EB estimate. Kulmala (1995) showed that for high values of K, the variance of the predicted accidents is low (equation 2.4), and therefore, there is a small uncertainty and the EB estimate is closer to the GL IM estimate. Conversely, when K is low, the variance of the predicted value is high as is the uncertainty of the GL IM model. Therefore, the EB estimate is closer to the observed value. Figure 2.1 shows how the K value affects the EB estimate. 15 Chapter II: Literature Review 12 11 Observed Number of Accidents (11 acc/3 years) Predicted Number of Accidents (6.88 acc/3 years) —i 1 1 1 1 1 1 1 1 10 20 30 40 50 60 70 80 90 100 k value Figure 2.1 Empirical Bayes' Estimate for Different k Values In addition to combining the two types of safety clues and providing site-specific safety estimates, it has also been shown that the EB procedure significantly reduces the regression to the mean effects that are inherent in observed accidents count (Briide and Larsson, 1988). The regression to the mean is a statistical phenomenon by which a randomly large number of accidents for a certain entity during a before period, is normally followed by a reduced number of accidents during a similar after period, even if no measures have been implemented (while the opposite applies in the case of a randomly small number of accidents). The EB refinement is important for various applications of GL IM models, such as identification and ranking of accident-prone locations, and assessment of effectiveness of safety measures. The 16 Chapter II: Literature Review EB estimate combined with reliable GL IM models, has the advantage of overcoming the difficulties associated with defining reference groups to perform before and after studies (Mountain and Fawaz, 1996). 2.6 Previous work There are few studies dealing with accident prediction models at junctions, even though most of accidents occurs at these kind of locations. Satterthwaite (1981), made an extensive review of over 80 studies dealing with the relationship between traffic accidents and traffic volumes. Most of models reported in this study consider accidents at road sections, and only 14 of the references reported deal with accident models at intersections. For accident at intersections Satterthwaite (1981) found some non-linear relationships between accidents and traffic volumes at T-intersections located in rural areas. The proposed models are desegregated in accidents of vehicles turning left and right from the minor road (non-through road) to the major road (through road). The relationships found are similar to equation (2.13) but in one study it was found that the a, and a2 coefficients are approximated to 0.5, while in a subsequent study a, is approximated to 1 while a2 is again approximated to 0.5. These models were developed during the 50's and 60's and were estimated by using the conventional regression analysis. 17 Chapter II: Literature Review Also reported were similar studies conducted at other intersections where traffic control and layout variables were taken into account, however there is no report related to accident prediction models at urban unsignalized intersections. At the conclusion of the study, it was found that results concerning accidents at intersections were not consistent and it was suggested that more research should be done. Bonneson and McCoy (1993), using data from 125 two-way stop controlled intersections in Minnesota, developed the following model: Accidents I year = 0.692 r AADTmajor r o a d \ ( AADTminor r o a d x 0 8 3 1000 ^ 1000 (2.19) Using a similar approach, Belanger (1994) developed several models using data from 149 4-leg unsignalized intersections in western Quebec. The models included the "total-accidents model" for different ranges of speed; "accident-type models" such as right angle, rear end etc.; and models including other variables such as the existence of flashing beacons, sight distance and turning lanes. For instance, the total-accidents model for all speeds developed by Belanger is as follows: Accidents I year = 0.00l93(AADTmajor roadf'^W^^minor road?'51 (2-20) Both Bonneson and McCoy and Belanger models were developed for intersections in rural areas assuming a negative binomial distribution error. 18 Chapter II: Literature Review In a more recent study, Maher and Summersgill (1996), using selected data recorded all over the U K , developed the following model for T-intersections on urban single carriageways based on the negative binomial distribution: Accidents I year = 0.049 r AADTmajor roaA ( AADTminor road^°'36 1000 1000 (2.21) In addition, Mountain and Fawaz (1996), using the same approach (negative binomial), derived a model for 390 unsignalized intersections located in 12 U K counties. Out of the 390 intersections, 338 were T-intersections and approximately 35% were located in urban areas. The model developed is as follows: Accidents I year = 0.141 f AADT • ^ ° ' 6 4 f*^* major road iooo 'AADTr v \ minor road 1000 0.24 J (2.22) Since this thesis deals only with intersections located in urban areas, only the models in equations (2.21) and (2.22) will be compared with the models developed in this thesis. This comparison is shown in next chapter. 2.7 Conclusion Developing accident prediction models has been a concern for the last four decades. During the 50's 60's, 70's, the models were limited by the use of the conventional linear regression analysis, 19 Chapter II: Literature Review leading to inconsistencies and misinterpretation in describing traffic accidents occurrence. The advancements in computer and software technology during the last two decades, and the development of more sophisticated statistical tools, has resulted in the development and release of software packages such as GL IM and SAS, which are capable of solving non-linear regression models by specifying any type of error structure consistent with the data. Several researchers have found that accident occurrences follow the negative binomial distribution, rather than the Poisson distribution, because it has been shown to be the most appropriate way to model overdispersion. With respect to the use of GL IM to develop safety models for unsignalized intersections, only a few studies were found. Most of these studies deal with the rural environment. Most of GL IM accident prediction models have been developed during the last 10 years and are focused on signalized intersections, rural areas, and road sections. More work is needed in developing accident prediction models for urban unsignalized intersections. 20 C H A P T E R III D A T A C O L L E C T I O N A N D M O D E L D E V E L O P M E N T 3.0 Introduction This chapter is divided into three sections. The first section contains a detailed description of the data used to develop the accident prediction models. It also includes a procedure to identify outliers which may affect the quality of the models. The second section describes the models developed and their goodness of fit. Finally, the third section shows a comparison between the developed models and similar models found in the literature. 3.1 Data Collection This thesis made use of sample accident and traffic volume data corresponding to unsignalized (both T and 4-leg) intersections located in urban areas of the Greater Vancouver area and the Vancouver Island. 3.1.1 Accident and Traffic Volume Data Three years of accident data was available for analysis on each intersection (1993-1995). The source of the accident data is the M V 104 accident reporting form, British Columbia's accident police report. The data set contained 427 intersections from the cities of Surrey, Victoria, Coquitlam, Vancouver, Burnaby and Nanaimo. The information available for each intersection includes the total number of accidents that occurred during the 1993-1995 period. The explanatory variables of accident occurrence included the traffic volumes on the both the major 21 Chapter III: Data Collection and Model Development and minor roads given in Average Annual Daily Traffic (AADT), and the type of intersection (T or 4-leg). Another explanatory variable taken into account for this thesis, is the type of intersection control, which was only available for Surrey intersections. Traffic control types included 2-way Stop, 4- way Stop, and one-way Stop at T-intersections. Tables 3.1 and 3.2 provide a statistical summary of the data. City Number of Intersections Number of Accidents Average AADT Total T 4-leg Acc/year Acc/yr/Int. Major Road Minor Road Surrey 56 18 38 285 5.08 17,937 3,075 Victoria 340 162 178 360 1.06 12,355 1,494 Nanaimo 10 0 10 23 2.25 7,172 3,242 Coquitlam 8 2 6 34 4.25 10,004 1,514 Burnaby 9 3 6 36 4.04 12,984 2,837 Vancouver 4 1 3 17 4.33 22,191 1,408 Total 427 186 241 755 1.77 13,186 1,770 Table 3.1 Summary of Accident, ntersection Control and Traffic Volume Data City Number of Accidents* AADT minor Road AADT Major Road Max. Min. Std. Dev. Max. Min. Std. Dev. Max Min Std Dev Surrey 11.0 1.7 2.3 9,300 500 2,060 42,600 2,100 10,385 Victoria 8.3 0.0 1.2 11,000 100 1,483 47,800 500 9,397 Nanaimo • 4.8 0.6 1.5 6,025 1,968 1,307 15,739 2,771 4,132 Coquitlam 8.7 1.0 2.7 2,360 730 542 32,310 730 10,642 Burnaby 10.3 0.3 3.0 7,415 365 2,252 29,020 5,715 7,492 Vancouver 8.3 0.3 3.5 2,550 860 775 37,295 7,835 12,070 Total 11.0 0.0 2.1 11,000 100 1,673 47,800 500 9,673 * Indicates average annual accidents per intersection Table 3.2 Statistical Summary of Accidents As shown in Table 3.1, the average number of accidents per intersection for the cities located in the Lower Mainland (Surrey, Coquitlam, Burnaby and Vancouver) is much higher than the average of number of accidents per intersection for the cities located in Vancouver Island (Victoria and Nanaimo). About 44% of the intersection are T-intersections, while the rest 56% 22 Chapter III: Data Collection and Model Development are 4-leg intersection. This indicates that there is not an absolute predominance of either one of the intersection types in the database, unlike the studies made by Mountain and Fawaz (1996) and Maher and Summersgill (1996), where their data set included mainly T-intersections. This condition is desirable when developing a total model of accidents as the model will not be biased in favor of one of the intersection types. As previously mentioned, intersection control type data is available only for Surrey intersections. Of the 56 intersections, 32 are two-way Stop controlled, 8 are 4-way Stop controlled, and the remainders 16 intersection are classified as one-way Stop-T intersections. 3.1.2 Outlier Analysis Outliers are defined as data points that split off or are very different from the rest of the data (Stevens, 1986). Outliers can be caused by irregularities or errors occurred during the data recording or observation process or when the data is genuinely different from the rest. These points deserve further investigation in order to decide whether or not to remove them. Kulmala (1995) proposed a procedure to identify outliers based on the calculation of the leverage statistic. The leverage of a point is a measure of how far the x-value of the point is away from the average of the rest of the x-yalues (NAG, 1994). The leverage values are the diagonal elements of the hat matrix, which is the matrix that multiplies the observed vector in order to yield the predicted vector. One of the properties of the leverage values, hh is that the sum over the n- values, yields the number of parameters, p, in the model. According to this statement the average 23 Chapter 111: Data Collection and Model Development value of the leverage is p/n, and many authors (NAG, 1994, Stevens, 1988) consider that a high leverage is one that exceeds 2p/n, and should be subject to further examination. However, it has been shown (NAG, 1994) that the leverage alone is not a good indication of whether the parameters estimate is being affected by specific observations. A measure which does this is the Cook's distance (NAG, 1994). The Cook's distance measures the influence of observations on the model. The higher the Cook's distance value for a given observation, the stronger its influence on the model. The Cook's distance is calculated as follows: < * = — M H 2 (3-D where, ht = leverage value p = number of parameters rt = standardized residual The main disadvantage of using the Cook's distance is that there is no clear rule for what constitute a high c,. N A G (1994) proposes to sort the data according to the Cook's distance values, and in a stepwise procedure, remove the points with the highest values, and for every point removed, assess the change in the scaled deviance. Maycock and Hall (1984) have found that the difference in scaled deviance in two models with degrees of freedom df, and df2, is %2 distributed with parameters (df, - df2). This means that if 24 Chapter III: Data Collection and Model Development only one point with a high Cook's distance is removed, then the difference in the scaled deviance must be greater than 3.8 (the %2 value for 95% level of confidence and 1 degree of freedom). GL IM has the capacity of extracting both leverage and Cook distance values, from each model. The procedure to identify outliers in the models developed in this thesis is to visually examine the relationship between the observed number of accidents for each intersection and the Cook's distance. Intersections with exceptionally large values of c, are then removed and the change in scaled deviance is determined. If this change is significant the intersections are removed. The previous analysis was performed to all models of this thesis. After the analysis none of the critical points were classified as outliers that should be removed. Figure 3.1 and Table 3.3 show the results of this procedure for the total accident model. From visual examination of Figure 3.1 it was determined to select five intersections for removal (Cook's Distance greater than 0.02 and the intersections are tagged 1 through 5 in the figure). As shown in Table 3.3, the cumulative drop in scaled deviance is always below the x2 statistics. This indicates that removing these intersections from the data set is not warranted. The analysis summarizing the results for the remaining models is shown in Appendix I. 25 Chapter III: Data Collection and Model Development to CD _g CO > CD O c B w b w Ik o o O 0.06 0.05 0.04 0.03 0.02 0.01 A Denotes High Cook's Distance A A 3 2 A 5 A 4 • • • • • D D • • • • , I I B ° • g • ° n B | Q B • B ° o.oo | i l t l & i o ^ w U ° 5 • • • HO- • o D a • „ D • -D—i—°- 0 10 15 20 25 Observed Accidents (acc/3 years) 30 1 35 Figure 3.1 Identification of the Highest Cook's Distance Values for Total Model Rank Cook's Distance Intersection Number Sample Size Scaled Dev. SD Drop Cumulative SD Drop x2 1 14 426 397.4 1.5 1.5 3.8 2 22 425 395.3 2.1 3.5 6.0 3 157 424 393.2 2.1 5.7 7.8 4 5 423 391.0 2.2 7.8 9.5 5 33 422 389.0 2.0 9.8 11.1 Table 3.3 Identification of Outliers for Total Model 3.2 Model Development The main task of this research is to develop multivariate models to estimate the predicted number of accidents. Four categories of models were developed in this thesis: (1) models for the total number of accidents; models for T and 4-leg intersections; (3) separate models for every region 26 Chapter III: Data Collection and Model Development (Vancouver Island, the Lower Mainland, and Surrey); and (4) a model for Surrey including intersection control type. Since the average number of accidents per year per intersections is relatively small (especially for intersections in Victoria and Nanaimo), it was decided to use the number of accidents in a three year period. The models developed are assumed to follow the negative binomial distribution, which is included in the GL IM software package, through a macro designed by N A G (1996). Out of the six goodness of fit tests described in section 2.3, the graphs describing the predicted accidents vs. the Prediction Ratio for every model are shown in Appendix II, while the rest of tests are shown with the description of each model. In general, the Prediction Ratio graphs show similar dispersions as the ones obtained for the observed vs. predicted accident graphs. Appendix III shows the GL IM output of all models, which in addition to the models' parameters, includes the scaled deviance, the K value (represented by THETA in the GL IM output), and the standard error of the parameters. Note that the model under Poisson distribution assumption is developed first. 3.2.1 Model for the Total Number of Accidents A model relating the total number of accidents to the traffic volumes for minor and major roads was developed. The whole data set is used for this analysis and Table 3.4 shows the parameter estimates of the model and its goodness of fit. 27 Chapter III: Data Collection and Model Development Model Form t-ratio S D K Pearson x 2 (dot) (X 2 test)* V 1000 ) V 1000 ) a0 a, a. 3.2 7.8 12.4 399 (424) 1.97 459 (472) * Denotes significance at a 95-percent confidence level Table 3.4 Model for the Total Number of Accidents The Pearson %2 indicates significance at the 5% confidence level. The t-ratios are significant for all the variables included in the model, and the scaled deviance value is smaller than the number of degrees of freedom. Figure 3.2 shows the relationship between the observed and predicted number of accidents for the model. The results are symmetrically clustered around the 45° line to a reasonable extent, which is desirable. In addition, Figure 3.3 shows the fit of the variance of the observed accidents (assuming a negative binomial distribution) to the average squared residuals. Each point represents the average of predicted accident frequency for a sequenced group of intersections (e.g. the first twenty intersections sorted by predicted accident frequency). The figure shows a reasonably good fit. 28 Chapter III: Data Collection and Model Developmt Observed Accidents (acc/ 3 yrs) Figure 3.2 Total Model: Observed vs. Predicted Number of Accidents 400 n , 350 -I Predicted Accidents (acc /3 yrs) Figure 3.3 Total Model: Predicted Accidents vs Estimated Variance 29 Chapter III: Data Collection and Model Development Figure 3.2 also indicates that intersections in the Lower Mainland are generally different from those in Vancouver Island. Therefore, separate models for the Lower Mainland and Vancouver Island intersections should be developed. 3.2.2 Models for T and 4-leg Intersections There are two ways to approach to these kinds of models. The first is to develop separate models for T and 4-leg intersections. Alternatively, one model can be developed using the entire sample size as the total model with the intersection type variable (T or 4-leg intersection) included within the model. Using the first approach, the sample size for T intersections is 186, and for 4-leg intersections is 241. Table 3.5 shows the parameter estimates for each model, as well as the different goodness of fit test. Both models have a relatively good fit with respect to the scaled deviance, and the %2 values are significant at the 95% confidence level. The t-test ratios for all the independent variables are significant, which indicates that the models are more dependent on the explanatory variables rather than a constant coefficient, which is also desirable. Model Form t-ratio SD K Pearson x2 (dof) (X2 test)* T-intersection model a0 -0.3 164 2.34 205 1, 1000 J V 1000 J a, a. 5.5 7.4 (183) (214) 4-leg intersection model a0 3.6 230 2.17 251 (AADT • A 0 - 4 0 9 9 fAAnr \0.1065 { 1000 J \ 1000 J a, a2 6.8 9.1 (238) (274) * Denotes significance at a 95-percent confidence level Table 3.5 Models for T and 4-leg Intersections 30 Chapter III: Data Collection and Model Development Figures 3.4 through 3.7 show the relationships between the observed and the predicted number of accidents, and the fit of the variance of the observed accidents to the average squared residuals, for both models. The results for both models are symmetrically clustered around the 45° line and the average squared residuals fits the variance equation well. 31 Chapter III: Data Collection and Model Development Observed Accidents (acc/ 3 yrs) Figure 3.4 T-lntersection Model: Observed vs. Predicted Number of Accidents 200 -, , Predicted Accidents (acc /3 yrs) Figure 3.5 T-lntersection Model: Predicted Accidents vs Estimated Variance 32 Chapter III: Data Collection and Model Development o Surrey o Victoria A Coquitlam • Vancouver • Burnaby • Nanaimo 0 5 10 15 20 25 30 35 40, Observed Accidents (acc/ 3 yrs) Figure 3.6 4-leg Intersection Model: Observed vs. Predicted Number of Accidents 500 -, 1 Predicted Accidents (acc /3 yrs) Figure 3.7 4-leg Intersection Model: Predicted Accidents vs Estimated Variance 33 Chapter III: Data Collection and Model Development Using the second approach a total model with the effect of intersection type model was developed by using only one equation that includes the effect of both T and 4-leg intersections. This model follows the same structure described in equation (2.14). Table 3.6 shows the estimates results of this model as well the goodness of fit test. The variable Type, which indicates the intersection type, has two values: 1 for T-intersections and 2 for 4-leg intersections. According to the results described in Table 3.6, all variables are significant in the model, the scaled deviance is also closed to the degrees of freedom, and the Pearson %2 test indicates a significance at the 95% confidence level. Model Form t-ratio SD K Pearson %2 (dof) (l2 test)* Total Model with Intersection Type aa -2.8 394 2.23 449 fJJIlT "\ 0.4221 \ 0.6480 Acc 12yrs = 0.5116^ D m a i r d \ JAADT^">) xe^19,TyPe V 1000 ) \ 1000 > a, a, b, 8.7 11.7 6.3 (423) (471) * Denotes significance at a 95-percent confidence level Table 3.6 Total Model Including the Effect of Intersection Type A brief comparison between this model and the total model developed in section 3.2.1 shows a smaller scaled deviance and Pearson %2 for the first model. Note that decreasing the degrees of freedom by 1, lead to a drop in scaled deviance of 5, which is greater than 3.8 (the 95-percent value of the %2 square distribution with 1 degree of freedom). This indicates the importance of including in the model as many explanatory variables as possible in order to get a better fit. Figures 3.8 and 3.9 show the relationships between the observed and the predicted number of accidents, and the fit of the observed accidents variance to the average squared residuals, 34 Chapter III: Data Collection and Model Development respectively. The results in Figure 3.8 show that the points are closer to the 45° line than the results displayed in Figure 3.3 (Total Model). Figure 3.9 also shows the tendency of the model's average squared residual to follow the variance equation. 35 Chapter III: Data Collection and Model Development 0 5 10 15 20 25 30 35 40 45 50 Observed Accidents (acc/ 3 yrs) Figure 3.8 Total Model with the Effect of Intersection Type: Observed vs. Predicted Number of Accidents 350 -. , 0 5 10 15 20 25 Predicted Accidents (acc /3 yrs) gure 3.9 Total Model with the Effect of Intersection Type: Predicted Accidents vs Estimated Variance 36 Chapter III: Data Collection and Model Development A comparative analysis between the two approaches developed in this section for intersection type model is shown in Table 3.7. The analysis made use of the entire sample of this study (427 intersections), and the number of predicted accidents was calculated by using the separate models and the total model including the intersection type. The results obtained in this analysis shows that using the single model provides a slightly lower Pearson %z test, and sum-of-squared error. This indicates that the total model with intersection type variables performs slightly better than using the two separate models. However the difference between both approaches is not significant, since the single model fits better only for 51% of the data. Parameters Separate Models Single Model Pearson %2 456 449 X test0 0 5 j n_p_! 472 471 Sum of Error2 12,117 11,882 Closer Estimates 208 (49%) 219(51%) Table 3.7 Intersection Type Model: 2 Separate Models vs Single Model Figure 3.10 shows the predicted accidents as a function of major road traffic volume, for both approaches. This figure shows that for 4-leg intersections, the separate model curve is slightly above the single model curve. For T-intersections, at low major road traffic volumes (AADT<12,000 veh/day), both curves are practically the same, but at high traffic volume the separate model curve is also slightly above the single model curve. The results of the comparison show that using a single model seems to be slightly accurate than using separate models. The differences between these two approaches are relatively small. 37 Chapter III: Data Collection and Model Development Therefore, using either single or separate models will yield practically the same results, and both approaches are valid. 16 14 0 5 10 15 20 25 30 35 40 Major Road AADT (thousands) -e— Single T-Int - B — Single 44eg - A - Separate T-Int -*— Separate 44eg Figure 3.10 Intersection Type Model: Separate vs. Single Models' In addition, Figure 3.10 shows that T-intersections are approximately 50% safer than 4-leg intersections. A more detailed analysis regarding this point will be introduced in Chapter 5. 3.2.3 Regional Models As described earlier, intersections in the Lower Mainland are generally different from those in Vancouver Island. Therefore, three regional models were developed: (1) a model for the Lower Mainland which comprises intersections located in the cities of Surrey, Coquitlam, Vancouver 38 Chapter III: Data Collection and Model Development and Burnaby; (2) a model for the Vancouver Island which comprises intersections located in Victoria and Nanaimo; and (3) a model for Surrey. It was decided to develop a model for Surrey because it has the highest average number of accidents per intersection. The sample size are 77, 350, and 56 for the Lower Mainland, Vancouver Island and Surrey models respectively. Table 3.8 shows the results of each model. For all models, the Pearson x2 values indicate significance at the 95% confidence level. The scaled deviance is also smaller than the degrees of freedom. The t-ratios are significant at 95% confidence level for all parameters, except for the major road traffic volume for the Surrey model, which is significant only at the 90% confidence level. Therefore, it is suggested that a larger sample size be used for the Surrey model or more explanatory variables should be added in order to obtain a more reliable model. Model Form t-ratio SD (dof) K Pearson %2 (X2 test)* Vancouver Island model 1. 1000 ) \ 1000 ) aa a, 2.6 6.0 9.0 302 (347) 2.92 383 (390) Lower Mainland model 1. 1000 J \ 1000 J aa a, a. 7.6 2.4 3.7 81 (74) 6.27 76 (94) Surrey model , ,„ (AADTmaJrd)0A5]6 (AADTminrA°mi Acc/3 yrs = 8.441 x -— x \ 1000 ) \ 1000 ) a0 a, a2 8.7 1.8 2.3 56 (53) 8.89 58 (70) Denotes significance at a 95-percent confidence level Table 3.8 Regional Models Figures 3.11 through 3.16 show the relationships between the observed and the predicted number of accidents, and the fit of the variance of the observed accidents to the average squared residuals. The results for Vancouver Island and Mainland models (Figures 3.11 through 3.14) are symmetrically clustered around the 45° line and the average squared residuals follow the variance 39 Chapter III: Data Collection and Model Development to a satisfactory extent. For Surrey model, the results shown in Figures 3.15 and 3.16, indicates a larger dispersion, which confirms the need either use a larger sample or add more explanatory variables to the model. 40 Chapter III: Data Collection and Model Development Observed Accidents (acc/ 3 yrs) Figure 3.11 Vancouver Island Model: Observed vs Predicted Number of Accidents 90 -i : 0 2 4 6 8 10 12 14 Predicted Accidents (acc /3 yrs) Figure 3.12 Vancouver Island Model: Predicted Accidents vs Estimated Variance 41 Chapter III: Data Collection and Model Development 0 5 10 15 20 25 30 35 40 Observed Accidents (acc/ 3 yrs) Figure 3.13 Lower Mainland Model: Observed vs Predicted Number of Accidents 140 -, . , 0 5 10 15 20 25 Predicted Accidents (acc/3 yrs) Figure 3.14 Lower Mainland Model: Predicted Accidents vs Estimated Variance 42 Chapter III: Data Collection and Model Development 0 5 10 15 20 25 30 35 40 Observed Accidents (acc/ 3 yrs) Figure 3.15 Surrey Total Model: Observed vs Predicted Number of Accidents 140 -, , 0 5 10 15 20 25 30 Predicted Accidents (acc /3 yrs) Figure 3.16 Surrey Total Model: Predicted Accidents vs Estimated Variance 43 Chapter III: Data Collection and Model Development Figure 3.17 shows a comparison of the total model estimated in section 3.2.1 with the three regional models. It should be noted that the total model lies between the Vancouver Island and the Lower Mainland models, which is expected. The total model curve is closer to the Vancouver Island model because more than 80% of the data comes from the cities of Victoria and Nanaimo. 10 15 20 25 30 Major Road AADT (thousands) 35 40 -©—Total -a-Surrey Vancouver Island -*— Lower Mainland Figure 3.17 Comparison of Total Model with Regional Models 3.2.4 Effect of Intersection Control Type Since data on intersection control type was only available for Surrey intersections, a Surrey total model including control type was estimated. As mentioned earlier, of the 56 intersections for Surrey data, 32 are classified as 2-way controlled, 8 as 4-way controlled and the remainders 16 as 44 Chapter III: Data Collection and Model Development one-way stop controlled T intersections. The type variable in the equation is denoted by 1, 2 and 3 respectively for each control type. Table 3.9 shows the results of this model. It can be noted that the a0 parameter is much more significant than the three variables included in the model. This parameter also has a relatively high value. This is considered a deficiency since it indicates that the number of accidents is less dependent on traffic volumes and the control type. The t-ratio for the control type is not significant. As in the previous Surrey total model, it is therefore suggested that a larger size be used to develop this model. Model Form t-ratio SD K Pearson %2 (dot) (X2test)* (AADT • A0A64S (AAHT N 0.2256 Acc/3yrs-S.&906x\ m"Jrd\ x [ A A D T ^\ x ,-0.06994*00^ V 1000 J V 1000 J aa a, a, b] 8.8 2.0 2.6 -1.0 55 (52) 9.01 58 (69) * Denotes significance at a 95-percent confidence level Table 3.9 Surrey Total Model with Control Type Figures 3.18 and 3.19 show the relationship between the observed and the predicted number of accidents, and the fit of the variance of the observed accidents to the average squared residuals. In both figures the points are dispersed around the lines, which indicates a relatively poor fit. 45 Chapter III: Data Collection and Model Development Observed Accidents (acc/ 3 yrs) Figure 3.18 Surrey Total Model with Control Type: Observed vs Predicted Number of Accidents 140 -, , 0 5 10 15 20 25 30 Predicted Accidents (acc /3 yrs) Figure 3.19 Surrey Total Model with Control Type: Predicted Accidents vs Estimated Variance 46 Chapter III: Data Collection and Model Development 3.3 Comparison with Previous Results As mentioned in Chapter 2, there are few studies which developed accident prediction models for urban unsignalized intersections. Therefore, this section will only compare the models developed herein to those developed by Maher and Summersgill (1996), and Mountain and Fawaz (1996). Since the dada base used to obtain these models comprised mainly of T-intersections, then the comparison was performed on the separate T-intersection model described in section 3.2.2. Figure 3.20 shows the results of these three models for a constant minor traffic volume of 2,000 vehicles per day. The T-intersection model developed in this thesis has higher frequencies than the other two models. The difference in results may be attributed to the fact that Maher and Summmersgiir model included only T-intersections on urban single carriageways while Mountain and Fawaz's model include both urban and rural intersections. As well, there are differences in regional characteristics and the accident reporting practice between the U K and British Columbia (different reporting limit, police attendance, etc.) 47 Chapter III: Data Collection and Model Development C O "13 o •5- 3 c 0 -g o o < & O T5 0 10 15 20 25 30 Major Road AADT (thousands) 35 40 -e— T-lntersection Model - B — Maher and Summersgill Mountain and Fawaz Figure 3.20 Comparison of T-Intersection model with Previous Studies 3.4 Conclusion Using the negative binomial distribution approach eight different accident prediction models were developed. The first model developed included the entire data set and related accident frequency with traffic volumes for the major and minor roads. The rest of the models were classified according to certain characteristics such as intersection type (T and 4-leg intersections), regional characteristics, and intersection control type. According to the various quality tests performed in this chapter, six out of the eight models showed a good statistical fit. The two models that showed poor fit, were characterized by having 48 Chapter III: Data Collection and Model Development a lower sample size. It was suggested to increase the sample size or to include more explanatory variables into the models. For the intersection type model, two different approaches were utilized: (1) by developing two separate models for each intersection type and; (2) by developing a single model that includes the intersection type as one of the variables. The differences between these two models were relatively small, and the effect of intersection type can be measured by using either approach. Finally, a procedure to identify outliers in the data set was performed according to the Cook's distance values. The procedure indicated that there were no outliers in the data. 49 C H A P T E R I V S T A T I S T I C A L C O N S I D E R A T I O N S 4.0 Introduction In this chapter several statistical issues will be discussed. The first issue relates to the error structure distribution. As described earlier, for the GLIM approach, the error structure is usually assumed to be Poisson or negative binomial. A comparison will be made between the two error structure distributions. The second issue relates to the method of calculating the parameter K of the negative binomial distribution. A comparison of several approaches to calculate K wil l be presented. 4.1 Poisson vs. Negative Binomial Distribution Error Structure As mentioned in Chapter Two, dispersion parameters (crd, defined in equation 2.7) can be used to decide whether to use the Poisson or the negative binomial distribution error structure. If the dispersion parameter in the Poisson distribution model is greater than one, then going for the negative binomial distribution is recommended. The Poisson distribution was used as a first step to develop all eight models discussed in Chapter 3. Appendix 2 shows the GLIM session results of the Poisson distribution. Table 4.1 shows a comparative analysis between these two approaches. Note that the dispersion parameter for the Poisson distribution is considerable high for all models, ranging from 2.57 for the Vancouver Island model, to 4.38 for the 4-leg intersection model. These high values are explained, by the lack of significance of the Pearson tests. This indicates that for all models the data has greater 50 Chapter IV: Statistical Considerations dispersion than can be explained by the Poisson distribution, and it is necessary to assume a negative binomial distribution error structure. Under the latter distribution, a d ranges from 1.02 for the Lower Mainland model, to 1.12 for T-intersection and Surrey control type models. This indicates that the data dispersion is satisfactorily explained by the negative binomial distribution. PARAMETERS Poisson Neg bin Poisson Neg bin Poisson Neg bin Poisson Neg bin Total Model Total Intersection Type T-lntersection 4-Leg Intersection aa 1.4833 1.4929 0.5906 0.5776 0.6717 0.9333 1.9007 1.6947 a, 0.4067 0.3839 0.4336 0.4221 0.5809 0.4531 0.3884 0.4099 a. 0.6086 0.7044 0.5937 0.6480 0.5902 0.5856 0.5944 0.7065 b, 0.5268 0.5379 Dispersion Parameter, <r(l 4.30 1.08 3.86 1.06 3.26 1.12 4.38 1.06 K 1.97 2.2 2.35 2.17 Scaled Deviance 1577 399 1436 394 485 164 942 230 Deg. of Freedom 424 424 423 423 183 183 238 238 Pearson % 1823 459 1634 449 597 205 1042 251 r(95%) 472 472 471 471 214 214 274 274 Error2 12494 12996 11687 11883 3267 3347 8272 8770 Closer Estimates 44% 56% 44% 56% 52% 48% 43% 57% Vancouver Island Lower Mainland Surrey Total Surrey Control Type a0 1.3327 1.3807 6.7666 6.5929 8.5677 8.4401 8.9442 8.8906 a, 0.3231 0.3042 0.2036 0.2011 0.1529 0.1516 0.1647 0.1645 a, 0.5240 0.5488 0.2474 0.2864 0.1720 0.1907 0.1958 0.2256 b, -0.0570 -0.0699 Dispersion Parameter, ud 2.57 1.10 3.35 1.02 2.97 1.10 3.00 1.12 K 2.92 6.27 8.89 9.10 Scaled Deviance 734 302 250 81 152 56 150 55 Deg. of Freedom 347 347 74 74 53 53 52 52 Pearson % 893 383 248 76 158 58 156 58 r(95%) 390 390 94 94 70 70 69 69 Error2 3879 3892 3633 3681 2431 2440 2417 2438 Closer Estimates 42% 58% 44% 56% 41% 59% 52% 48% Table 4.1 Comparison between Poisson and Negative Binomial Distribu ion In addition to the dispersion parameter, Table 4.1 also shows other parameters to compare both model approaches such as the scaled deviance, Pearson x2, error squared and the share of the predicted accidents closer to the observed accidents. 51 Chapter IV: Statistical Considerations The scaled deviance in the Poisson distribution is considerably greater for all models and exceeds the number of degrees of freedom from 112% for the Vancouver Island model, to 296% for 4-leg model. In contrast, for the negative binomial distribution models, the scaled deviance is relatively close to the degrees of freedom, which indicates a reasonably good fit. Regarding the other comparative tests such as the sum of error squared and the closer predicted values, Table 4.1 shows that the sum of error squared is slightly smaller in the Poisson models than in the negative binomial models. However, for this latter assumption, there are more predicted values closer to the observed data. This indicates that, while most of the data fits the negative binomial distribution model better, the estimates that fit the Poisson distribution better have higher differences with the observed values when using negative binomial distribution models. 4.2 Approaches for Estimating the Negative Binomial Distribution Parameter K There are several approaches to estimate the parameter K of the negative binomial distribution error (Famoye, 1997). The macro library of the GL IM software package contains three methods: maximum likelihood and two methods of moments called mean x2 and mean deviance. In addition Kulmala (1995), following Maycock and Hall (1984), proposed a method of moments, in which the parameter K is initially calculated from the estimates obtained from the Poisson distribution model. A l l these methods are iterative. The method of maximum likelihood has been the most widely used (Hauer et-al, 1988, Bonneson and McCoy, 1993, Maher and Summersgill, 1996). According to Lawless (1987) this method is 52 Chapter IV: Statistical Considerations based on the log-likelihood function, which is the natural logarithm of the joint probability function of the negative binomial distribution (equation 2.5). This is a function of p. and K, where ju is also a function of the parameter estimates, ar The iterative process is aimed at maximizing the log-likelihood function with respect to the parameter estimates, ap for selected values of K . The iterative process continues until the maximum value of K has been reached. The mean %2 method consists of fitting the Pearson %2 value to the number of degrees of freedom. As a first iteration, K is solved from the Pearson %2 equation, and the initial estimates are calculated by using the Poisson distribution. Having an initial value of K , the new parameters are estimated. Then the process is repeated until convergence. . The mean deviance method is similar to the mean %2 with the main difference being that the scaled deviance is forced to equal the number of degrees of freedom. The method of moments proposed by Kulmala (1995) and Maycock and Hall (1984) consists of estimating a first value of K , based on the following equation: tz^i ic* ^ (4.1) t(error?-E(A)i) i=\ where the predicted values E(A); are initially estimated based on the Poisson distribution model. Then, the K value is the run to a G L I M macro to estimate the parameters under the negative binomial distribution. As the previous methods of moments, the process is repeated until 53 Chapter IV: Statistical Considerations convergence. According to Kulmala (1995) the estimates obtained in this method deviate less than 5% from those produced by the maximum likelihood method. The previous four methods were used to estimate the parameter K. The results obtained in the GLIM sessions are shown in Appendix IV. Table 4.2 summarizes the results obtained for each method. The table shows that the parameters' values are equal up to the first two decimal points for all methods, and the t-ratios show that for all cases the variables are significant. This shows a relative similarity between the four methods. PARAMETER MAXIMUM MEAN x 2 MEAN DEVIANCE MOMENTS LIKELIHOOD (KULMALA) Value t-ratio Value t-ratio Value t-ratio Value t-ratio a„ 1.4929 3.2 1.4963 3.1 1.4905 3.3 1.4947 3.4 a, 0.3839 7.8 0.3827 7.5 0.3850 8.0 0.3832 8.0 a2 0.7044 12.4 0.7058 12.0 0.7023 12.8 0.7054 12.8 K 1.97 1.76 2.15 1.85 Scaled Deviance 399 370 424 382 Deg. of Freedom 424 424 424 424 Pearson % 459 424 489 439 ^(95%) 472 472 472 472 Error2 12996 13006 12981 13003 able 4.2 Results of Different Negative Binomial Methods in the Total Model With Regard to the parameter K , there are more differences than the model's parameters. The highest K value is obtained through the method of mean deviance, followed by the method of maximum likelihood. According to the criteria of maximizing K , which reduces the variance, the best method would be the mean deviance, while the worst would be the mean yj method. However, by analyzing the Pearson %2 statistic, the method with the highest K is not significant at the 95% confidence level. Therefore, the best method would be the maximum likelihood, which has the second highest K parameter. 54 Chapter IV: Statistical Considerations Table 4.2 also shows that, the scaled deviance value for all methods is significant compared with the degrees of freedom. The Pearson x2 statistic is significant at 95% of confidence level for all models, except the mean deviance model, and the sum of error squared show that the estimates are quite similar, which is a result of the similarity in the model's parameters. Figure 4.1 shows the predicted accidents of the total model as a function of major road traffic volume for the different methods. This Figure shows all curves turning into one curve, which confirms the similarities, found in Table 4.1. This analysis shows that with exception of the mean deviance method, which was not significant according to the Pearson x2 statistic, the other three methods yield approximately the same results. However, out of the three significant methods, the maximum likelihood yields the highest K value, and for this reason it is regarded as the most appropriate method. 55 Chapter IV: Statistical Considerations 0 5 10 15 20 25 30 Major Road A A D T (thousands) >— Maximum Likelihood -a— Mean Chi-Square - A — Mean Deviance -*— Moments (Kulmala) Figure 4.1 Predicted Accidents using Different Methods to Obtain K 4.3 Conclusion This chapter was intended to demonstrate the advantages of the methodology used in chapter three to derive the accident prediction models. First, it was demonstrated in a comparative fashion that the accident prediction models for urban unsignalized intersections follow the negative binomial distribution rather than the Poisson distribution. Next, it was also demonstrated that the maximum likelihood method is the most appropriate to calculate the negative binomial model's parameter because it yields the maximum value of K for significant models. However, it was found that the methods of the mean %2 and the method of moments proposed by Kulmala, yielded significant results which were similar to the maximum likelihood method. 56 CHAPTER V APPLICATIONS 5.0 Introduction As described earlier, there are several applications of accident prediction models. This chapter describes five different applications. The first four applications relate to the use of the Empirical Bayes refinement: identification of accident-prone locations, developing critical accident frequency curves, ranking the identified accident-prone locations and before and after safety evaluation. The fifth application provides a safety-planning example, comparing the safety performance of a 4-leg intersection and two staggered T-intersections for the same traffic volume. Empirical Bayes refinement applications are demonstrated using the model relating the total number of accidents to traffic flows (Table 3.4) because it is the most general model. The Vancouver Island and the Lower Mainland models are also used in the identification and ranking of accident-prone locations. 5.1 Empirical Bayes Refinement As mentioned in Section 2.5, the main goal of using the Empirical Bayes refinement is to yield more accurate, location-specific safety estimate by combining the observed number of accidents at the location, with the predicted number of accidents obtained from the GL IM model. To illustrate this process, assume that an unsignalized intersection has the following data: 57 Chapter V: Applications Major road ADT = 15,000 veh/day Minor road ADT = 2,000 veh/day Observed accidents = 11 acc/3 years Using the model from Table 3.4, the safety of this intersection is: pred = 1.4929 x ' 15,000" v 1,000 y 0.3839 ^2,000A a,ooo. 0.7044 6.88 acc I'3 years Using equations 2.17 and 2.18 the empirical safety estimate and its variance respectively, can be calculated as: EB, safety estimate 1.97 U.97 + 6.88 x6.88 + 6.88 1.97 + 6.887 x 11 = 10.08 acc /3 years Var(EBsayety estimate) f 6.88 ^ 2 ^6.88 + 1.97/ xl .97 + 6.88 \ 2 6.88 +1.97^ x 11 = 7.84 (acc/3 years)" In this example the expected number of accidents is reduced from 11 to 10.08 which corresponds to about eight percent regression to the mean correction. Figure 5.1 illustrates the Empirical Bayes refinement estimation versus the values predicted from the G L I M model. Notice that the EB estimates are much closer to the 45° line, indicating an 58 Chapter V: Applications average regression to the mean correction of 35% although for some extreme cases, the corresponding correction is up to 150%. 59 Chapter V: Applications 0 5 10 15 20 25 30 35 Observed Accidents (acc/ 3 yrs) Figure 5.1 Predicted vs. EB Refined Number of Accidents for Total Model 60 Chapter V: Applications 5.2 Identification of Accident Prone Locations Accident prone locations (APLs) are defined as the locations that exhibit a significant number of accidents compared to a specific norm. Because of the randomness inherent in accident occurrence, statistical techniques that account for this randomness should be used when identifying APLs. The EB refinement method can be used to identify APLs according to the following process (Belanger, 1994): 1. Estimate the predicted number of accidents and its variance for the intersection, using the appropriate GL IM model. This follows a gamma distribution (the prior distribution) with parameters a, and /?„ where: = E(A) = _ K _ ^ a E(A) = K (5.1) 2. Determine the appropriate point of comparison based on the mean and variance values obtained in step (1). Usually the 50 t h percentile (P50) is used as a point of comparison. P 5 0 is calculated such that: 0 ™ 61 Chapter V: Applications 3. Calculate the EB safety estimate and its variance from equations (2.17) and (2.18) respectively. This is also a gamma distribution (posterior distribution) with parameters a2 and p2: EB K /J? : = h 1 and aj = • EB = K + count (5.3) 1 Var(EB) E(A) 1 H l Then, the probability density function of the posterior distribution is: r ,* ( K l E ( A ) + lfc+count)f+comt-\e-{KIE{A)+\)X J EB W = r=— (5 -4) 1(K + count) 4. Identify the location as accident-prone i f there is a significant probability that the intersection's safety estimate exceeds the P 5 0 value. Thus, the location is identified as accident prone if: ^50 {KI E(A) + \ J.K+COUNT) A\K+count~^e~^K ^ E(A)+\)X 1 — I dA, * r(K + count) >5 (5.5) where ^represents the confidence level desired (usually 0.95) For the example given in the previous section, the predicted number of accidents and its variance is 6.88 acc/3yr and 24.67 (acc/3yr)2 respectively. Then using equation 5.2 to obtain the P 5 0 value: 62 Chapter V: Applications Pf (1.97/6.88)1 9 7 - i 9 7 - 1 .e-d-97/6.88)A a A = 0.5 0 A l -97 ) solving the integral for 0.5, the P 5 0 value is 5.75 acc/3yr. From the pervious section, the EB estimate and its variance is 10.08 acc/3yr and 7.84 (acc/3yr)2 respectively. Using equation 5.5 the left-hand side of the equation is: 5 f (1.97 / 6.88 + if 9 7 + 1 1 } A1 ' 9 7 + 1 X-Xe^ 9 7 1 6 8 8 + 1 ^ A , n n c 1- '- <3/t = 0.96 { A l . 9 7 + 11) /TL This indicates that there is a significant probability (96%) of exceeding the P 5 0 value and the intersection can be considered accident-prone. Figure 5.2 shows a graphical representation of this example: 63 Chapter V: Applications 0.16 Accidents/3 years Figure 5.2 Identification of Accident Prone Locations 5.3 Critical Accident Frequency Curves The process of identifying accident-prone locations, as described in the previous section, involves considerable computational effort. To facilitate this process, critical accident frequency curves can be developed for each GL IM model. A critical curve is one that indicates the number of observed accidents that must be exceeded in order to classify the location as accident-prone for a given GL IM model and a confidence level. The procedure to obtain these critical curves is iterative and makes use of equations (5.2) and (5.5). The initial data is the number of predicted accidents based on a GL IM model with its K parameter. For every predicted accident, the P 5 0 value is calculated by using equation (5.2). This 64 Chapter V: Applications value is used in equation (5.5), where for a given level of confidence, the equation is solved in an iterative fashion, in order to find the observed number of accidents (variable count in equation (5.5)) that fits the given level of confidence. The critical curve is obtained by joining all the critical points in a Predicted versus Observed Accidents chart. As an example, Figures 5.3, 5.4 and 5.5 show these curves for the total model (Table 3.4), Vancouver Island and Lower Mainland models (Table 3.8). Three curves are shown in each figure, representing the 90%, 95%, and 99% confidence levels. To illustrate the use of these curves, consider the example described in Section 5.1. Using the total model and the given traffic volumes, 6.88 accidents/3 years, are estimated. For this number of accidents and for 99% confidence level, at least 13 accidents/3 years need to be observed to consider this intersection as accident-prone (Figure 5.3). Table 5.1 shows the number of APLs identified by the three models for different significance levels. MODEL L E V E L OF CONFIDENCE 90% 95% 99% Total Model 82 67 51 Vancouver Island Model 38 30 21 Lower Mainland Model 21 14 6 Table 5.1 Number of Accident Prone Locations 65 Chapter V: Applicatu 50 0 5 10 15 20 25 30 35 40 Predicted Accidents (acc/ 3 years) o Surrey o Victoria A Coquitlam • Vancouver • Burnaby • Nanaimo Figure 5.3 Critical Curves for Total Model 66 Chapter V: Applications o Victoria • Nanaimo Figure 5.4 Critical Curves for Vancouver Island Model 67 Chapter V: Applications o 0 5 10 15 20 25 30 Predicted Accidents (acc/ 3 years) o Surrey * Coquitlam • Vancouver • Burnaby Figure 5.5 Critical Curves for Lower Mainland Model 68 Chapter V: Applications An extension of the critical curves can be also applied for different K values. Figure 5.6 shows the critical curves for eight different values of K and a confidence level of 95%. The advantage of this kind of curves is that they can be used for any negative binomial model. The data required to use this curve is a negative binomial model from which the predicted number of accidents is calculated and according to the model's K value, the critical number of accidents is estimated by using the curve for the corresponding K . The disadvantage of this method is that the results are not as accurate as the previous ones, because in most cases there is not a curve for the specific K value (i.e. K=2.17) and the critical value is estimated by approximating the K value to the closest curve. 69 Chapter V: Applications 0 5 10 15 20 25 30 35 40 45 50 Predicted Accidents Figure 5.6 Critical Curves for Different Values of K 70 Chapter V: Applications Note that the higher the K value, the higher the critical number of accidents. The rationale for this is illustrated in Figure 5.7. Figure 5.7-a shows the same example as in Section 5.1, but in this case it is assumed to have a K value of 1.0 (low K) and an observed number of accidents of 9.05 acc/3 years (this is the critical number of accidents at 95% of confidence level). Under these conditions the predicted number of accidents is the same 6.88 acc/3 years, but due to the change in the K value and the observed number of accidents, the P 5 0 is 4.77 acc/3 years and the EB estimate is 8.77 acc/3 years. The probability of having accidents greater than P 5 0 value is 95%, a critical condition. Figure 5.7-a shows that at low values of K , the prior distribution is skewed left, and the EB estimate is close to the observed number of accidents. The reason of this is that low K values increase the variance leading to more uncertainty about the predicted value. Therefore, the EB estimate is closer to the observed value rather than the predicted one. Figure 5.7-b, shows the same model but the K value is considerably higher (K=20). The observed number of accidents is the same as in Figure 5.7-a, but in this case due to the increase in K , this value is no longer critical. The EB estimate is now closer to the predicted number of accidents instead of the observed one, because the variance has decreased leading to more reliability about the GL IM model estimate. The prior distribution is less skewed and closer to the posterior distribution. 71 Chapter V: Applications In order to find the critical number of accidents for the conditions in this case (Figure 5.7-b), it is necessary to raise considerably the observed number of accidents. Figure 5.7-c shows that the critical value is 15.65 accidents/3 years, which represents an increase of 6.5 accidents/3 years compared with the previous conditions, while the EB estimate has also increased but only by 1.6 acc/3 years. This latter value remains closer to the predicted number of accidents. 72 Chapter V: Applications a) K=1.0 and Observed Accidents=Critical Accidents/3 years b) K=20 and Observed Accidents=Critical for K=1 0.30 -i - 1 Accidents/3 years c) K=20 and Observed Accidents=Critical 0.30 -, - 1 Accidents/3 years ure 5.7 Comparison of Critical Accidents for Different K Values 73 Chapter V: Applications 5.4 Ranking of Accident Prone Locations The methods used to identify accident-prone locations explained in the previous two sections can be also useful in ranking these locations. Two ranking criteria can be used. The first is to calculate the ratio between the EB estimate and the predicted frequency (as obtained from the GL IM model) for the accident prone locations identified in the previous section. This ratio represents the deviation of the intersection from the "norm". The higher this ratio the more accident prone the intersection is. The justification for using this ranking criterion is to ensure that the safety level at each criterion is comparable to other intersections with similar characteristics. Another criterion is to calculate the difference between the EB estimate and the predicted frequency for the accident prone locations. This difference is a good indication of the expected safety benefits and is useful for carrying out the estimation of the pre-implementation safety benefits of countermeasures. Unlike the previous criterion, this one is useful to quantify economical benefits. A comparison of the two ranking criteria is shown in Table 5.2 for the Vancouver Island Model. Twenty-one accident prone locations (APLs) were identified at the 99% confidence level. The table shows the values of both the difference (EB - Predicted) and ratio (EB/Predicted) for all APLs. As shown in Table 5.2, the difference in rank between the two criteria ranges between 1 and 15 with an average value of 4.9. The difference in rank seems to be higher for the top ranked intersections. The reason for this difference can be explained by the different goals of the two criteria. The first criterion favors intersections with high accident frequency which are usually 74 Chapter V: Applications more cost-effective to treat. The second criterion considers the deviation from the expected values and its variance regardless of the number of accidents observed. This criterion can be considered by road authorities to ensure that the safety of different locations is within acceptable levels. Int. Intersection Observed Predicted EB EB-Pred EB/Pred Rank Rank Diff No. Frequenc y Accidents Refined EB-Pred EB/Pred Rank 1 Blanshard-Topaz 24 7.3 19.2 11.9 2.6 1 5 4 2 Cook-Kiwanis 25 9.9 21.6 11.7 2.2 2 14 12 3 Douglas-Tolmie 24 11.1 21.3 10.2 1.9 3 18 15 4 Finlayson-Nanaimo 18 4.3 12.4 8.2 2.9 4 3 1 5 Government-Discovery 18 3.4 11.3 7.9 3.3 5 1 4 6 Vancouver-Balmoral 15 4.1 10.5 6.4 2.5 6 7 1 7 Douglas-Princess 15 3.9 10.3 6.4 2.6 7 6 1 8 Cook-View 15 3.7 10.0 6.3 2.7 8 4 4 9 Southgate-Vancouver 14 4.7 10.4 5.7 2.2 9 12 3 10 Bowen-Pine-Access 16 8.6 14.1 5.5 1.6 10 21 11 11 Dallas-Douglas 14 2.4 7.6 5.2 3.2 11 2 9 12 Douglas-Discovery 13 3.9 9.1 5.2 2.3 12 11 1 13 Albert-Fourth-Pine-Park 13 3.7 8.9 5.2 2.4 13 9 4 14 Quadra-Topaz 13 3.6 8.7 5.2 2.5 14 8 6 15 Wakesiah-Fourth 13 4.8 9.9 5.1 2.1 15 17 2 16 Fairfield-Foulbay 12 4.1 8.7 4.6 2.1 16 15 1 17 Douglas-Spruce 12 5.4 9.7 4.3 1.8 17 20 3 18 Shelbourne-Pearl 11 3.4 7.5 4.1 2.2 18 13 5 19 Quadra-Burdett 11 3.7 7.8 4.1 2.1 19 16 3 20 Hillside-Graham 11 2.9 6.9 4.0 2.4 20 10 10 21 Quadra-Pembroke 11 4.6 8.6 3.9 1.8 21 19 2 Table 5.2 Ranking of APLs for The Vancouver Island Model Figure 5.8 shows the values of the two ranking criteria for the top 10 APLs. The figure shows that intersections 1, 3, 4, 5, 6, and 7 are among the top 10 intersections for both methods with an average ranking difference of 5.6. Intersections 2, 8, 9, and 10 are included in the top 10 using the "difference" ranking, but are not included in the "ratio" ranking. The degree of proneness of these intersections, despite of showing high expected benefits, is not among the top 10 75 Chapter V: Applications intersections. The same applies to intersections 12, 14, 18 and 21 which show a high degree of proneness, but its indication of expected benefits is not among the top 10 intersections. co "O c co CD o c CD I TJ CD •o «8 m LU Difference Ranking Ratio Ranking Number inside bar denotes intersection number according to Table 5.2 Figure 5.8 Ranking of Top10 APL for Island Model There are other ranking criteria relating both a ratio and a difference. These other methods involves parameters of accident prediction models such as the predicted vs. observed number of accidents, observed vs. critical curve value, observed vs. EB estimates, etc. These criteria can be implemented by using the same methodology of this section. There is little research concerning the ranking criteria when using accident prediction models. This is an area that surely needs further research. 76 Chapter V: Applications 5.5 Before and After Studies The effect of a safety measure is often studied by comparing the number of accidents observed after the implementation of the measure, to the expected number of accidents had the measure not been implemented. In simple before and after studies, the observed number of accidents in the period before the implementation is used to estimate the latter value. However, because of the random variations in accident occurrence (e.g. the regression to the mean effect), the observed number of accidents before the implementation may not be a good estimate of what would have happened had no measure been implemented. An alternative and more accurate approach is to use the EB refinement process. Using the same example as before, assume that a specific safety measure to reduce the number of accidents at the intersections was implemented. The observed number of accidents in the next three years following the implementation is 8. Therefore, the effectiveness of the measure can be calculated as: g Measure of Effectiveness = 1 = 0.21 10.08 which indicates a reduction by 21% in total accidents because of the treatment. The importance of using accident prediction models in before and after studies is highlighted by the difficulty associated in developing this analysis in a traditional fashion, via a reference group of comparison. This group should be of sufficient size and homogeneity to carry out an accurate analysis. The difficulty lies in defining a group with these features. Accident prediction models 77 Chapter V: Applications overcome this difficulty, since they represent of local conditions and replace the role of the traditional reference group. 5.6 Safety Comparison of Staggered T and 4-leg Intersections Several researches have compared the safety performance of 4-leg intersections and staggered T- intersections. Kulmala (1995) found that, in general, the staggering of 4-leg intersections into two staggered T-intersections reduces the number of injury accidents if the percentage of traffic entering the junction form the minor road is greater than 5% of the total traffic. He also found that i f 50% of the total traffic enter the junction from the minor road, the staggering would reduce the number of injury accidents by 23%. Kulmala (1995) found his results consistent by comparing them with some Nordic studies, where the staggering was found to decrease the number of injury accidents by 0% to 20%. In order to confirm these results, a safety comparison of 4-leg and staggered T-intersections was carried out using the models developed in Table 3.5 (the separate T and 4-leg intersection models). According to the analysis made in Section 3.2.2, it is also valid to use the total model with intersection type (Table 3.6) which yields approximately the same results. The following assumptions were made: 1. The traffic volumes on the major and minor roads for the 4-leg intersections are V, and V 2 , respectively (expressed in AADT). 78 Chapter V: Applications 2. For the two staggered T-intersections, the traffic volume on the major road is V „ while the minor approaches have traffic volumes of V 2 /2 (expressed in AADT). This assumption ensures that the traffic volume in both scenarios is the same. 3. The two staggered intersections will not affect each other (isolated intersections). This assumption depends, of course, on the distance between the two intersections. The results of the comparison are shown in Figure 5.9 for three different minor road traffic volumes. The results indicate that the staggering is effective in reducing the predicted number of accidents. This reduction increases as the traffic volume on the major or minor road increases. It should be noted that the degree of reduction would vary with different ratios of traffic volumes on the major and minor roads. 79 Chapter V: Applications a) Minor Road AADT=500 veh/day J L II 10 15 20 25 Major Road A A D T (thousands) 30 35 40 b) Minor Road AADT=2,000 veh/day J L II 10 15 20 25 Major R o a d A A D T (thousands) 30 35 40 45 -, 40 - yr s) 35 - o 30 - a 25 - u U O 20 - < Si 15 - u •5 e 10 - 5 - 0 - c) Minor Road AADT=10,000 veh/day 10 15 20 25 30 35 Major R o a d A A D T (thousands) 40 45 50 Figure 5.9 Staggered T vs 4-leg Intersections Safety Comparison 80 Chapter V: Applications 5.7 Conclusion This chapter has shown five applications of accident prediction models. Most of these applications make use of the EB refinement methods, in order to reduce the regression to the mean phenomenon. It has been shown that accident prediction models are useful in identifying accident prone locations (APLs) with a probabilistic confidence level by using both analytical and graphical methods. It is also possible to rank the APLs by two different criteria, difference and ratio, according to the particular objectives of the road's authorities. In addition, accident prediction models can be used for evaluating the safety of a countermeasure, without having to define a reference group, because the GL IM models contains the characteristics of the location. Finally, it was found that staggered T-intersections are safer than 4-leg intersections, a finding that should be taken into account by road planning authorities. These results agree with those found in the literature. 81 CHAPTER VI CONCLUSIONS AND RECOMMENDATIONS 6.1 Conclusions The main objective of this project is to develop accident prediction models for estimating the safety potential of urban unsignalized (T and 4-leg) intersections in the Greater Vancouver Regional District (GVRD) and Vancouver Island on the basis of their traffic characteristics. The models are developed using the generalized linear regression modeling (GLIM) approach, which addresses and overcomes the shortcomings associated with the conventional linear regression approach. The safety predictions obtained from GLIM models can be refined using the Empirical Bayes' approach to provide, more accurate, site-specific safety estimates. The use of the complementary Empirical Bayes approach can significantly reduce the regression to the mean bias that is inherent in observed accident counts. This study made use of sample accident and traffic volume data corresponding to unsignalized (both T and 4-leg) intersections located in urban areas of the Greater Vancouver Regional District (GVRD) and Vancouver Island. The data included a total of 427 intersections located in the cities of Victoria, Surrey, Nanaimo, Coquitlam, Burnaby and Vancouver. The information available for each intersection included the total number of accidents in the 1993-1995 period, traffic volumes for both major and minor roads given in Average Annual Daily Traffic (AADT) and type of intersection (T or 4-leg). 82 Chapter VI: Conclusions and Recommendations Four categories of models were developed in this study: (1) models for the total number of accidents; (2) separate models for T and 4-leg intersections; (3) separate models for different regions (Vancouver Island, the Lower Mainland and Surrey); and (4) a model for Surrey including intersection control. Table 6.1 summarizes the models' results. Models developed in this thesis used the negative binomial distribution approach, which has the advantage of explaining the dispersion characteristic of the observed data compared with the Poisson distribution. In addition, different tests showed that the maximum likelihood method yields the most appropriate parameters under the negative binomial distribution assumption. Model Form t-ratio SD (dof) K Pearson x 2 (X2 test)* Model for the total number of accidents (AADTmairA03m (AADT - . f 7 0 4 4 Acc / 3 yrs-1.4929 x\ m a j r d \ x\AAU1mmrd\ \ 1000 ) \ 1000 J a, a2 3.2 7.8 12.4 399 (424) 1.97 459 (472) T-intersection model (AADT • A 0 - 4 5 3 1 / J j n T \0.5806 A /-> nn-,-,1 AAU1ma]rd\ [ AADTminrd\ Acc 13 yrs = 0.9333 x - — x m m r a V 1000 J V 1000 J aa a, a2 -0.3 5.5 7.4 164 (183) 2.34 205 (214) 4-leg intersection model (AADTmaird\Mm (AADT - A*™65 Accllyrs = 1.6947x m a j r d x A A U 1 ^ r d \ 1000 J V 1000 J a„ a, a2 3.6 6.8 9.1 230 (238) 2.17 251 (274) Total Model with Intersection Type AccHyrs = 0 . 5 7 7 6 x { A A D T » > « J " i t ™ JAADTminnA06m ^ ^ i l 9 x T y p e \ 1000 ) I 1000 J a0 a, a, b, -2.8 8.7 11.7 6.3 394 (423) 2.23 449 (471) Vancouver Island model (AADT . A 0 3 0 4 2 ( AAV,T \0.5488 Accllyrs = 1.3807x * AADTn>inrd] V 1000 J V 1000 J a0 a, a2 2.6 6.0 9.0 302 (347) 2.92 383 (390) Lower Mainland model . „ » m n ( A A D T m a j r d ) 0 2 m (AADTminrdtnM Acc/3 yrs = 6.5929 x — x m m m \ IOOO ) v IOOO ; aa a, a. 7.6 2.4 3.7 81 (74) 6.27 76 (94) * Denotes significance at a 95-percent confidence level Table 6.1 Summary of Accident Prediction Models 83 Chapter VI: Conclusions and Recommendations Five applications of accident prediction models were used in this thesis. Four of them related to the use of the Empirical Bayes refinement: identification of accident-prone locations, developing critical accident frequency curves, ranking the identified accident-prone locations and before and after safety evaluation. The fifth application provided a safety-planning example, comparing a 4- leg intersection to two staggered T-intersections. It was shown that accident prediction models are useful in identifying accident prone locations (APLs) with a probabilistic confidence level by using both analytical and graphical methods. It is also possible to rank the APLs by two different criteria, difference and ratio, according to the particular objectives of the road's authorities. In addition, accident prediction models can be used to evaluate the safety benefits of a countermeasure, without having to define a reference group, because the GL IM models contains the characteristics of the location. Finally, it was found that staggered T-intersections are safer than 4-leg intersections, a finding that should be taken into account by road planning authorities. These results agree with previous researches made in the Scandinavian countries. 6.2 Recommendations for further research This thesis has developed accident prediction models for urban unsignalized intersections that included independent variables such as traffic volumes and control type. It is recommended that these models be further refined by adding more variables such as: 84 Chapter VI: Conclusions and Recommendations Intersection control type: An attempt to develop this model was made in this thesis, but the results showed a poor fit. Therefore it is recommended to use a larger sample size to obtain a significant model, in order to assess the safety effect of intersection control type in a similar fashion that this thesis assessed the safety effect of T and 4-leg intersections. Intersection Layout variables: Accident occurrence can be explained by several variables. Including intersection layout variables (e.g. number of lanes of each road, number of left and right turn lanes, pedestrian crosswalks, speed limit, etc) should enhance our understanding of the relationships between accident occurrence and geometric design. Accident Type: In safety evaluation of countermeasures it may be necessary to look at individual accident types (e.g. rear-end, right angle, etc.) as opposed to the total number of accidents. Therefore, it is recommended that models for specific accident types be developed. Finally, as explained earlier, there is a need for more research on ranking accident prone locations. This is very important in situations when the road authority has resources to address only a limited number of accident prone locations, it is important to focus on those with the highest potential of accident reduction or those which deviates from the normal safety levels for similar locations. 85 BIBLIOGRAPHY Belanger, C. (1994). "Estimation of safety of four-leg unsignalized intersections", Transportation Research Record, 1467, Transportation Research Board, National Research Council, Washington D. C , pp. 23-29. Bonneson, J. A. and McCoy, P. T., (1993). "Estimation of safety at two-way stop-controlled intersections on rural highways", Transportation Research Record, 1401, Transportation Research Board, National Research Council, Washington D. C , pp. 83-89. Bonneson, J. A. and McCoy, P. T., (1997). "Effect of median treatment on urban arterial safety: An accident prediction model", Transportation Research Record, 1581, Transportation Research Board, National Research Council, Washington D. C , pp. 27-36. Briide, U. and Larsson, J. (1988). "The use of prediction models for eliminating effects due to regression-to-the mean in road accident data", Accident Analysis and Prevention, Vol 20, No 4, pp. 299-310. Famoye, F. (1997). "Parameter estimation for generalized negative binomial distribution", Communications-in-Statistics. Part B: Simulation and Computation, Vol . 26, No 1, pp 269-279. 86 Bibliography Feng, S. and Sayed, T. (1997). "Accident prediction models for signalized intersections". The University of British Columbia, Department of Civi l Engineering. Report prepared for the Insurance Corporation of British Columbia. Hauer, E., Ng J. C. N. and Lovell J., (1988). "Estimation of safety at signalized intersections", Transportation Research Record, 1185, Transportation Research Board, National Research Council, Washington D. C , pp. 48-61. Hauer, E. (1992). "Empirical Bayes approach to the estimation of 'unsafely': The multivariate regression method", Accident Analysis and Prevention, Vol 24, No 5, pp. 457-477. Jovanis, P. P. and Chang H. L. (1986). "Modeling the relationship of accidents to miles traveled", Transportation Research Record, 1068, Transportation Research Board, National Research Council, Washington D. C , pp. 42-51. Kulmala, R. and Roine, M., (1988). "Accident prediction models for two-lane roads in Finland", Conference on traffic safety theory and research methods proceedings, April, Session 4: Statistical analysis and models. Amsterdam: SWOV, pp. 89-103. Kulmala, R., (1995). "Safety at rural three- and four-arm junctions. Development of accident prediction models", Espoo 1995, Technical Research Centre of Finland, VTT 233. 87 Bibliography Lawless, J., (1987). "Negative binomial and mixed Poisson regression", The Canadian Journal of Statistics, Vol . 15 No 3, pp. 209-225. McCullagh P. and Nelder J.A., (1983) "Generalized Linear Models", Chapman and Hall, New York. Maher, M . J. and Summersgill, I., (1996). " A comprehensive methodology for the fitting of predictive accidents models", Accident Analysis and Prevention, Vol 28, No 3, pp. 281-296. Maycock, G. and Hall, R. D. (1984). "Accidents at 4-arm roundabouts", Transport and Road Research Laboratory. TRRL Laboratory Report 1120. Miau, S. and Lum, H., (1993). "Modeling vehicle accident and highway geometric design relationships", Accidents Analysis and Prevention, Vol . 25, No 6, pp. 689-709. Mountain, L. and Fawaz, B., (1996). "Estimating accidents at junctions using routinely-available input data", Traffic Engineering and Control, Vol. 37, No 11, pp. 624-628. Numerical Algorithms Group (NAG), (1994), "The GL IM system. Release 4 manual", The Royal Statistical Society. Numerical Algorithms Group (NAG), (1996), " G L I M 4. Macro Library Manual, Release 2", The Royal Statistical Society. 88 Bibliography Saccomanno, F. F. and Buyco, C. (1988). "Generalized loglinear models of truck accident rates". Paper presented at Transportation Research Board 67 t h annual meeting. Washington, D. C. Satterthwaite, S. P., (1981). " A survey of research into relationships between traffic accidents and traffic volumes", Transport and Road Research Laboratory. TRRL Supplementary Report 692. Stevens J., (1986). "Applied multivariate statistics for the social sciences" Lawrence Erlbaum Associates, Inc., Publishers, Hillsdale, N..J. 89 APPENDIX I RESULTS OF OUTLIERS IDENTIFICATION 03 0 _g CD > 0 O c & w Ik o o O 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 • Denotes High Cook's Distance 4- Q • 5 10 15 20 ~o~o cr • ° ° ° n • i " 25 Observed Accidents (acc/3 years) 30 35 Figure AM Identification of the Highest Cook's Distance Values for Total Model with Intersection Type Rank Cook's Distance Intersection Number Sample Size Scaled Dev. SD Drop Cumulative SD Drop x2 1 157 426 390.69 3.63 3.63 3.84 2 45 425 389.49 1.20 4.83 5.99 3 51 424 388.36 1.13 5.96 7.81 4 14 423 387.19 1.17 7.13 9.49 5 44 422 385.66 1.53 8.66 11.07 6 22 421 383.85 1.81 10.47 12.59 7 5 420 382.25 1.60 12.07 14.07 8 224 419 380.45 1.80 13.87 15.51 Table Al- Identification of Outliers for Total Model with Intersection Type 90 Appendix I: Results of Outliers Identification 0.30 0.27 0.24 0.21 0.18 0.15 0.12 0.09 0.06 0.03 0.00 A Denotes High Cook's Distance • -9- 10 15 20 25 Observed Accidents (acc/3 years) 30 35 Figure AI-2 Identification of the Highest Cook's Distance Values for T-lntersection Model Rank Cook's Distance Intersection Number Sample Size Scaled Dev. SD Drop Cumulative SD Drop x2 1 66 185 160.61 3.00 3.00 3.84 2 9 184 159.54 1.07 4.07 5.99 3 14 183 158.15 1.39 5.46 7.81 4 53 182 157.21 0.94 6.40 9.49 5 8 181 156.03 1.18 7.58 11.07 6 103 180 154.62 1.41 8.99 12.59 Ta ble AI-2 Identification of Outliers 1 or T-lntersection Model 91 Appendix I: Results of Outliers Identification 0.07 0.06 w 0.05 a) 3 0.04 0.03 0.02 0.01 0.00 A Denotes High Cook's Distance • • • • • B • n _ B ° • • • • • • • 5 10 15 20 • • 25 Observed Accidents (acc/3 years) 30 35 Figure AI-3 Identification of the Highest Cook's Distance Values for 4-leg Intersection Model Rank Cook's Distance Intersection Number Sample Size Scaled Dev. SD Drop Cumulative SD Drop x2 1 12 240 229.12 1.18 1.18 3.84 2 20 239 227.18 1.94 3.12 5.99 3 3 238 225.32 1.86 4.98 7.81 4 29 237 223.38 1.94 6.92 9.49 5 223 236 221.94 1.44 8.36 11.07 6 159 235 219.83 2.11 10.47 12.59 7 133 234 218.25 1.58 12.05 14.07 8 231 233 217.27 0.98 13.03 15.51 Table AI-3 Identification of Outliers for 4-Leg Intersection Model 92 Appendix I: Results of Outliers Identification 0.09 0.08 0.07 J 0.06 co > 8 0.05 c 8 g 0.04 o 0.03 o O 0.02 0.01 0.00 • -o- A Denotes High Cook's Distance • -A- | E - BR B • • 1 B T 10 — I — 15 20 25 Observed Accidents (acc/3 years) 30 35 Figure AI-4 Identification of the Highest Cook's Distance Values for Vancouver Island Model Rank Cook's Intersection Sample Size Scaled Dev. SD Drop Cumulative SD x1 Distance Number Drop 1 101 349 298.88 2.97 2.97 3.84 2 41 348 298.01 0.87 3.84 5.99 3 73 347 297.44 0.57 4.41 7.81 4 200 346 295.41 2.03 6.44 9.49 5 326 345 294.14 1.27 7.71 11.07 6 222 344 291.91 2.23 9.94 12.59 7 89 343 290.04 1.87 11.81 14.07 Table AI-4 Identification of Outliers for Vancouver Island Model 93 Appendix I: Results of Outliers Identification <D _2 CD > O C CD -*—• W 0.09 0.08 0.07 0.06 0.05 £ 0.04 o 0.03 o O 0.02 0.01 0.00 A Denotes High Cook's Distance a -cr P • „ • 4 Q • 5 • ° H ° D D D Q • • • • • - Q — • q • • - •—9 — - r - O - 10 15 20 25 Observed Accidents (acc/3 years) • a -o 30 35 Figure AI-5 Identification of the Highest Cook's Distance Values for Lower Mainland Model Rank Cook's Distance Intersection Number Sample Size Scaled Dev. SD Drop Cumulative SD Drop x2 1 14 76 80.08 0.95 0.95 3.84 2 5 75 78.81 1.27 2.22 5.99 3 22 74 77.47 1.34 3.56 7.81 4 74 73 73.99 3.48 7.04 9.49 5 66 72 70.52 3.47 10.51 11.07 6 10 71 69.49 1.03 11.54 12.59 7 57 70 67.94 1.55 13.09 14.07 Table AI-5 Identification of Outliers for Lower Mainland Model 94 Appendix I: Results of Outliers Identification 0.16 0.14 A Denotes High Cook's Distance 0.12 0.10 0.08 0.06 0.04 • • 0.02 0.00 o u —r~ 5 I • • • • • • • o c 10 15 20 25 Observed Accidents (acc/3 years) 30 35 Figure AI-6 Identification of the Highest Cook's Distance Values for Surrey Total Model Rank Cook's Intersection Sample Size Scaled Dev. SD Drop Cumulative SD x2 Distance Number Drop 1 14 55 54.60 1.12 1.12 3.84 2 5 54 53.14 1.45 2.58 5.99 3 4 53 52.26 0.88 3.45 7.81 Table AI-6 Identification of Outliers for Surrey Total Model 95 Appendix I: Results of Outliers Identification 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 A Denotes High Cook's Distance A • A A A A A Q • • • • • • • • • [ • n • . a-V-B-B o_ • 3 0 O ° m 0 ° j-Q-a-° a , , 10 15 20 25 Observed Accidents (acc/3 years) 30 35 Figure AI-7 Identification of the Highest Cook's Distance Values for Surrey Total Model with Intersection Control Type Rank Cook's Intersection Sample Size Scaled Dev. SD Drop Cumulative SD x1 Distance Number Drop 1 14 55 54.40 1.08 1.08 3.84 2 51 54 53.40 1.00 2.08 5.99 3 45 53 52.40 1.00 3.08 7.81 4 10 52 51.49 0.90 3.98 9.49 5 5 51 50.15 1.35 5.33 11.07 6 49 50 49.35 0.80 6.13 12.59 7 4 49 48.40 0.95 7.08 14.07 Table AI-7 Identification of Outliers for Surrey Total Model with Intersection Control Type 96 APPENDIX II PREDICTION RATIO VS. ACCIDENT FREQUENCY o c o •8 -3 - -5 - 10 15 20 25 Predicted Accidents (acc/ 3 yrs) 30 35 o Surrey o Victoria * Coquitlam • Vancouver • Burnaby • Nanaimo Figure AIM AR vs. Accident Frequency for Total Model 1 0.5 0 -0.5 1 "I or .1 -1.5 T 3 £0 -2 Q_ -2.5 -3 -3.5 •A 0 ° ° o O 0 9 o U f a . u v 0 ° o o o O i 0 / o 0 0 ° " Ho o • o " • (J o 0 o o 0 v 0 o o o 4 6 8 10 12 Predicted Accidents (acc/ 3 yrs) 14 16 Figure AII-2 AR vs. Accident Frequency for T-intersection Model 97 Appendix II: Prediction Ratio vs. Accident Frequency o '•E3 c o ••8 =5 -2 - 8> Q_ -3 o ° » ° ° • o o o o °.o° Sffo *o o 0 10 15 20 25 Predicted Accidents (acc/ 3 yrs) 30 35 40 0 Surrey o Victoria * Coquitlam • Vancouver • Burnaby * Nanaimo Figure All-3 AR vs. Accident Frequency for 4-leg intersection Model 8. » ° % o - o -3 -4 - -5 -6 I <*>o 10 15 20 25 Predicted Accidents (acc/ 3 yrs) 30 35 40 o Surrey 0 Victoria 1 Coquitlam • Vancouver • Burnaby • Nanaimo Figure All-4 AR vs. Accident Frequency for Total Model Including Intersection Type 98 Appendix II: Prediction Ratio vs. Accident Frequency -A -5 -I -6 0 4 6 8 10 12 Predicted Accidents (acc/ 3 yrs) 14 o Victoria • Nanaimo 16 Figure AII-5 AR vs. Accident Frequency for Vancouver Island Model 10 15 20 Predicted Accidents (acc/ 3 yrs) 25 o Surrey * Coquitlam • Vancouver • Burnaby Figure All-6 AR vs. Accident Frequency for Lower Mainland Model 99 Appendix II: Prediction Ratio vs. Accident Frequency 2 10 15 Predicted Accidents (acc/ 3 yrs) Figure AII-7 AR vs. Accident Frequency for Surrey Total Model 10 15 Predicted Accidents (acc/ 3 yrs) Figure AII-8 AR vs. Accident Frequency for Surrey Model with Control Type 100 APPENDIX III GLIM SESSION FOR ESTIMATING APM [o] GLIM 4, update 8 for IBM etc. 80386 PC / DOS on 28-Oct-1997 at 09:33:59 [o] (copyright) 1992 Royal S t a t i s t i c a l Society, London [o] [i] ? $C ACCIDENT PREDICTION MODELS FOR UNSIGNALIZED INTERSECTIONS$ [i] ? $C TOTAL MODEL$ [i] ? $Units 427$ [i] ? $Data VI V2 Total Total_3yr Type$ [i] ? $Dinput 'unsigall.txt'$ [i] ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ [i] ? $Yvar Total_3yr $Error P $Link L$ [i] ? $Fit LV1+LV2 $D E$ [o] scaled deviance = 1576.8 at cycle 4 [o] r e s i d u a l df = 424 [o] [o] estimate s.e. parameter [o] 1 0.3943 0.07586 1 [o] 2 0.4067 0.02814 LV1 [o] 3 0.6086 0.02636 LV2 [o] scale parameter 1.000 [o] [i] ? $Input %plc 80 NEGBIN.MAC$ [ e ] i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * [ e ] i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * [e] $echo off$ [h] [i] ? $Number theta=0$ [i] ? $Use negbin theta $D E$ [o] scaled deviance = 398.83 (change = -1178.) at cycle 3 [o] r e s i d u a l df = 424 (change = 0 ) [o] [o] ML Estimate of THETA = 1.966 [o] Std Error = ( 0.1828) [o] [o] NOTE: standard errors of fi x e d e f f e c t s do not [o] take account of the estimation of THETA [o] [o] 2 x Log-likelihood = 4697. on 424 df [o] 2 x F u l l Log-likelihood = -2138. [o] [o] estimate s.e. parameter [o] 1 0.4007 0.1233 1 [o] 2 0.3839 0.04940 LV1 [o] 3 0.7044 0.05676 LV2 [o] scale parameter 1.000 [o] [i] ? $C END OF TOTAL MODEL$ [i] ? $C TOTAL MODEL INCLUDING INTERSECTION TYPE$ 101 Appendix III: GLIM Session for Estimating APM o] estimate s. e .- parameter o] 1 -0.5266 0.1114 1 o] 2 0.4336 0 . 02801 LV1 o] .3 0.5937 0 . 02721 LV2 o] 4 0.5268 0 . 04564 TYPE o] scale parameter 1.000 . 0 ] i] ? $Number theta=0$ i] ? $Use negbin theta $D E$ .0] scaled deviance = 394 3 2 (change -1042. 0 ] r e s i d u a l df = 423 (change 0 . O J .0] ML Estimate of THETA = 2 .228 .0] Std Error = ( 0. 2170) ? $Yvar Total_3yr $Error P $Link L$ -- model changed ? $Fit LVl+LV2+Type $D E$ scaled deviance = 1435.8 at cycle < res i d u a l df = 423 at cycle 2 NOTE: standard errors of f i x e d e f f e c t s do not take account of the estimation of THETA .0] 2 x Log-l i k e l i h o o d = 4734. on 423 .0] 2 x F u l l L og-likelihood = -2100. . 0 J .0] estimate s. e. parameter .0] 1 -0. 5488 0 . 1920 1 0 ] 2 0.4221 0 . 04843 LV1 0 ] 3 0 . 6480 0 . 05534 LV2 ' 0 ] 4 0 . 5379 0 . 08504 TYPE .0] scale parameter 1.000 ? $C END OF TOTAL MODEL INCLUDING INTERSECTION TYPE$ ? $C MODEL FOR T INTERSECTIONS$ ? $Units 186$ ? $Data VI V2 Total Total_3yr$ ? $Dinput 'unsigt.txt'$ ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ ? $Yvar Total_3yr $Error P $Link L$ ? $Fit LV1+LV2 $D E$ scaled deviance = 485.10 at cycle 4 res i d u a l df = 183 estimate s.e. parameter 1 -0.3980 0.1675 1 2 0.5809 0.05957 LV1 3 0.5902 0.04250 LV2 scale parameter 1.000 ? $Input %plc 80 NEGBIN.MAC$ 1 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 1 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * $echo off$ 102 Appendix III: GLIM Session for Estimating APM h] i] ? $Number theta=0$ i] ? $Use negbin theta $D E$ o] scaled deviance = 163.61 (change = -321.5) at cycle 4 o] r e s i d u a l df = 183 (change = 0 ) o] o] ML Estimate of THETA = 2.345 o] Std Error = ( 0.3825) o] o] NOTE: standard errors of fi x e d e f f e c t s do not o] take account of the estimation of THETA o] o] 2 x Log-likelihood = 995.9 on 183 df o] 2 x F u l l Log-likelihood = -805.6 o] o] estimate s.e. parameter o] 1 -0.06907 0.2149 1 o] 2 0.4531 0.08263 LV1 o] 3 0.5856 0.07892 LV2 o] scale parameter 1.000 o] ;i] ? $C END OF T INTERSECTION MODEL$ i ] ? $C FOUR LEGGED INTERSECTION MODEL$ ;i] ? $Units 241$ ;i] ? $Data VI V2 Total Total_3yr$ ;i] ? $Dinput 'unsig4.txt'$ ;i] ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ i] ? $Yvar Total_3yr $Error P $Link L$ ;i] ? $Fit LV1+LV2 $D E$ o] scaled deviance = 942.2 9 at cycle 4 ô] r e s i d u a l df = 238 o] o] estimate s.e. parameter o] 1 0.6422 0.08496 1 ;o] 2 0.3884 0.03194 LV1 o] 3 0.5944 0.03557 LV2 'o] scale parameter 1.000 io] l] ? $Input %plc 80 NEGBIN.MAC$ g] i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * g] i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * >] $echo off$ :h] ii] ? $Number theta=0$ ii] ? $Use negbin theta $D E$ o] scaled deviance = 230.30 (change = -712.0) at cycle 3 io] r e s i d u a l df = 23 8 (change = 0 ) .o] [o] ML Estimate of THETA = 2.172 !o] Std Error = ( 0.2647) 'o] o] NOTE: standard errors of fi x e d e f f e c t s do not o] ' take account of the estimation of THETA 'o] ;o] 2 x Log-likelihood = 3740. on 238 df 103 Appendix III: GLIM Session for Estimating APM o] 2 x F u l l Log-likelihood = -1293. o] o] estimate s.e. parameter o] 1 0.5275 0.1481 1 o] 2 0.4099 0.06025 LV1 o] 3 0.7065 0.07740 LV2 o] scale parameter 1.000 o] i ] ? $C END OF FOUR LEGGED INTERSECTION MODEL$ i ] ? $C ISLAND MODEL$ i] ? $Units 350$ i] ? $Data VI V2 Total Total_3yr$ i] ? $Dinput 1 i s l a n d . t x t 1 $ i] ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ i] ? $Yvar Total_3yr $Error P $Link L$ i] ? $Fit LV1+LV2 $D E$ o] scaled deviance = 734.32 at cycle 4 o] r e s i d u a l df = 347 o] o] estimate s.e. parameter o] 1 0.2872 0.09310 1 o] 2 0.3231 0.03632 LV1 o] 3 0.5240 0.03783 LV2 o] scale parameter 1.000 o] i] ? $Input %plc 80 NEGBIN.MAC$ g i i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * g i i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * e] $echo off$ h] i] ? $Number theta=0$ i] ? $Use negbin theta $D E$ o] scaled deviance = 301.85 (change = -432.5) at cycle 2 o] r e s i d u a l df = 347 (change = 0 ) o] o] ML Estimate of THETA = 2.920 o] Std Error = ( 0.3861) o] o] NOTE: standard errors of f i x e d e f f e c t s do not o] take account of the estimation of THETA o] o] 2 x Log-likelihood = 985.9 on 347 df o] 2 x F u l l Log-likelihood = -1473. o] o] estimate s.e. parameter o] 1 0.3226 0.1224 1 o] 2 0.3042 0.05025 LV1 o] 3 0.5488 0.06123 LV2 o] scale parameter 1.000 o] i ] ? $C END OF ISLAND MODEL$ i ] ? $C MAINLAND MODEL$ i ] ? $Units 77$ i] ? $Data VI V2 Total Total_3yr$ i] ? $Dinput 'mainland.txt'$ 104 Appendix III: GLIM Session for Estimating APM i ] ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ i] ? $Yvar Total_3yr $Error P $Link L$ i] ? $Fit LV1+LV2 $D E$ o] scaled deviance = 249.82 at cycle 4 o] r e s i d u a l df = 74 o] o] estimate s.e. parameter o] 1 1.912 0.1410 1 o] 2 0.2036 0.04688 LV1 o] 3 0.2474 0.04259 LV2 o] scale parameter 1.000 o] i] ? $Input %plc 80 NEGBIN.MAC$ g] r * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * e] $echo off$ h] i] ? $Number theta=0$ i] ? $Use negbin theta $D E$ 'o] scaled deviance = 81.024 (change = -168.8) at cycle 3 o] r e s i d u a l df = 74 (change = 0 ) o] o] ML Estimate of THETA = 6.265 o] Std Error = ( 1.486) o] o] NOTE: standard errors of fi x e d e f f e c t s do not o] take account of the estimation of THETA o] o] 2 x Log-likelihood = 3870. on 74 df o] 2 x F u l l Log-likelihood = -505.9 o] o] estimate s.e. parameter o] 1 1.886 0.2478 1 o] 2 0.2011 0.08382 LV1 o] 3 0.2864 0.07817 LV2 o] scale parameter 1.000 o] i ] ? $C END OF MAINLAND MODEL$ i ] ? $C TOTAL MODEL FOR SURREY$ i] ? $Units 56$ ;i] ? $Da VI V2 Total_3yr TControl$ i i ] ? $Dinput 'unsry.txt'$ i] ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ i] ? $Yvar Total_3yr $Error P $Link L$ i] ? $Fit LV1+LV2 $D E$ o] scaled deviance = 151.51 at cycle 3 o] r e s i d u a l df = 53 o] 'o] estimate s.e. parameter o] 1 2.148 0.1532 1 o] 2 0.1529 0.05082 LV1 o] 3 0.1720 0.05030 LV2 o] scale parameter 1.000 'o] i] ? $Input %plc 80 NEGBIN.MAC$ 105 Appendix III: GLIM Session for Estimating APM i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * $echo off$ ? $Numer theta=0$ ? $Use negbin theta $D E$ scaled deviance = 55.716 (change = resi d u a l df = 53 (change = -95.80) at cycle 3 0 ) ML Estimate of THETA = 8.893 Std Error = ( 2.639) NOTE: standard errors of f i x e d e f f e c t s do not take account of the estimation of THETA 2 x Log-likelihood 2 x F u l l Log-likelihood 3007. on 53 df -360 . 5 estimate 2 .133 0.1516 0.1907 scale parameter 1.000 s.e. 0 .2439 0.08195 0.08343 parameter 1 LV1 LV2 ? $C END OF SURREY TOTAL MODEL$ ? $C MODEL FOR SURREY INTERSECTION INCLUDING TYPE OF CONTROL$ ? $Units 56$ ? $Da VI V2 Total_3yr TControl$ ? $Dinput 'unsry.txt'$ ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ ? $Yvar Total_3yr $Error P $Link L$ ? $Fit LVl+LV2+TControl $D E$ scaled deviance = 149.55 at cycle 3 resi d u a l df = 52 estimate s.e. parameter 1 2.191 0 .1571 1 2 0.1647 0 . 05173 LV1 3 0.1958 0 . 05310 LV2 4 -0.05703 0 . 04071 TCONTROL scale parameter 1.000 ? $Input %plc 8 0 NEGBIN.MAC$ i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * j * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * $echo off$ ? $Number theta=0$ ? $Use negbin theta $D E$ scaled deviance = 55.476 (change = resi d u a l df = 52 (change = -94.07) at cycle 3 0 ) ML Estimate of THETA = 9.096 Std Error = ( 2 . 714) 106 Appendix III: GLIM Session for Estimating APM [o] NOTE: standard errors of fi x e d e f f e c t s do not [o] take account of the estimation of THETA [o] [o] 2 x Log-likelihood = 3008. on 52 df [o] 2 x F u l l Log-likelihood = -359.4 [o] [o] estimate s.e. parameter [o] 1 2.185 0.2491 1 [o] 2 0.1645 0.08265 LV1 [o] 3 0.2256 0.08752 LV2 [o] 4 -0.06994 0.06764 TCONTROL [o] scale parameter 1.000 [o] [i] ? $C END OP SURREY MODEL$ [i] ? $Stop 107 APPENDIX IV GLIM SESSION FOR DIFFERENT NEGATIVE BINOMIAL METHODS [o] GLIM 4, update 8 for IBM etc. 80386 PC / DOS on 09 -NOV-1997 at 01:14:42 [o] (copyright) 1992 Royal S t a t i s t i c a l Society, London [o] [i] ? $!TOTAL MODEL ESTIMATION OF NEGATIVE BINOMIAL DISTRIBUTION PARAMETERS! [i] ? $!METHOD OF MAXIMUM LIKELIHOOD (NAG, 1996)! [i] ? $units 427$ [i] ? $data v l v2 t o t a l t o t a l _ 3 y r type$ [i] ? $dinput 'unsigall.txt'$ [i] ? $calc lv l = % l o g ( v l ) : Lv2=%log(V2)$ [i] ? $Yvar t o t a l _ 3 y r $error P $link L$ [i] ? $Fit LV1+LV2 $D E$ [o] scaled deviance = 1576.8 at cycle 4 [o] r e s i d u a l df = 424 [o] [o] estimate s.e. parameter [o] 1 0.3943 0.07586 1 [o] 2 0.4067 0.02814 LV1 [o] 3 0.6086 0.02636 LV2 [o] scale parameter 1.000 [o] [i] ? $input %plc 80 NEGBIN.MAC$ fe] |****************************************************** [e] ! Author: John Hinde, MSOR Department, U n i v e r s i t y of Exeter [e] ! jph@msor.ex.ac.uk [e] ! Version: 1.1 GLIM4 February 1996 [e] ! [e] ! Main Macros: [e] ! NEGBIN F i t s a negative binomial d i s t r i b u t i o n f o r [e] ! overdispersed count data. For d e t a i l s on the [e] ! negative binomial d i s t r i b u t i o n see Lawless (1987) [e] ! Canadian J. of Stats, 15, 209-225. [e] ! The overdispersion parameter theta can be f i x e d [e] ! or estimated, using an inner loop embedded [e] ! within the model f i t t i n g process. If the [e] ! s p e c i f i e d parameter value i s zero, estimation [e] ! i s performed using e i t h e r maximum l i k e l i h o o d (default), [e] ! the expected value of the chi-squared s t a t i s t i c [e] ! as i n Breslow, N.E. (1984) Applied S t a t i s t i c s [e] ! 33, p38-44, or the mean deviance. [e] ! [e] ! P r i o r to using t h i s macro the following model [e] ! aspects need to be declared: [e] ! [e] ! y-variate: use $YVAR <yvariate> [e] ! [e] ! model formulae: t h i s w i l l be taken from the l a s t f i t [e] ! d i r e c t i v e , or can be e x p l i c i t l y set using 108 Appendix IV: GLIM Session for Different Negative Binomial Methods [e] ! $TERMS <model formula> [e] ! [e] ! l i n k function: set using $LINK [e] ! permissible values i , 1, s [e] ! [e] ! Formal arguments: [e] ! theta (obligatory) scalar f o r negative binomial [e] ! parameter estimate [e] ! i f theta=0 estimation i s performed [e] ! i f theta /=0 used as f i x e d value i n negative [e] ! binomial f i t [e] ! method (optional) Scalar c o n t r o l l i n g estimation method when [e] ! appropriate [e] ! 1 = maximum l i k e l i h o o d (default i f theta=0) [e] ! 2 = mean chi-square estimation [e] ! 3 = mean deviance estimation [e] ! 4 = use f i x e d value of theta (default i f theta /=0 ) [e] ! t o l (optional) Scalar s p e c i f i e s tolerance c r i t e r i o n to [e] ! control convergence of i t e r a t i o n on theta. [e] ! Defaults to 0 .0001 . [e] ! If tol<=0 then convergence c r i t e r i o n i s set to %cc, [e] ! the system convergence c r i t e r i o n , [e] ! [e] ! Output: [e] ! Displays the negative binomial deviance, the degrees of freedom [e] ! for the f i t t e d regression model, the estimate of theta, i t s [e] ! standard error when using maximum l i k e l i h o o d estimation, [e] ! and values of the l o g - l i k e l i h o o d . The deviance provides a [e] ! goodness-of-fit measure f or a negative binomial [e] ! d i s t r i b u t i o n with the current value of theta. [e] ! When theta i s fi x e d deviance differences can be used to [e] ! assess the importance of model terms. [e] ! To compare models with d i f f e r e n t values of theta the [e] ! l o g - l i k e l i h o o d must be used. [e] ! In p a r t i c u l a r , t h i s applies f o r comparisons with [e] ! the standard Poisson model (theta=infinity) [e] ! The lo g - l i k e l i h o o d s are those f o r the negative binomial [e] ! d i s t r i b u t i o n , the f u l l version including the y! terms. [e] ! [e] ! Side E f f e c t s : [e] ! On ex i t from the macro the model i s s t i l l defined with [e] ! a negative binomial variance function. Submodels can then [e] ! be f i t t e d d i r e c t l y with $FIT d i r e c t i v e s . This w i l l work [e] ! f i n e following a fi x e d parameter f i t , but should be [e] ! used with caution i f theta was estimated - use of $RECYCLE [e] ! could help things i n t h i s case. [e] ! [e] ! Example of use: [e] ! $yvar y $link 1 $terms 11$ [e] ! $number theta=0 $ [e] ! $use negbin theta$ [e] ! [e] ! NB_OUT Can be used a f t e r subsequent $FIT d i r e c t i v e s to obtain [e] ! output given by NEGBIN, i . e . the estimate of theta, i t s 109 Appendix IV: GLIM Session for Different Negative Binomial Methods [e] ! standard error for maximum l i k e l i h o o d f i t s and the [e] ! l o g - l i k e l i h o o d values, [e] ! [e] ! Formal arguments: [e] ! theta (obligatory) scalar f o r negative binomial [e] ! parameter estimate [e] ! [e] ! Example of use: [e] ! $yvar y $link 1 $terms 11$ [e] ! $number theta=0 $ [e] ! $use negbin theta$ [e] ! $recy $ f i t -11$ [e] ! $use nb_out$ [e] ! [e] ! [e] ! To delete macros and global v a r i a b l e s , type [e] ! $delete #d_negbin d_negbin $ [e] ! [g] i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * [e] $echo off$ [f] ** i d e n t i f i e r expected but not found, at [80 NEGBIN.] [f ] [h] The $INP d i r e c t i v e expected an i d e n t i f i e r but found the character nstead. [h] Check the syntax of the d i r e c t i v e , [h] [i] ? $number theta=0$ [i] ? $number mode=l$ [i] ? $use negbin theta mode $D E$ [w] -- model changed [w] -- model changed [o] scaled deviance = 398.83 (change = -1178.) at cycle 3 [o] r e s i d u a l df = 424 (change = 0 ) [o] [o] ML Estimate of THETA = 1.966 [o] Std Error = ( 0.1828) [o] [o] NOTE: standard errors of f i x e d e f f e c t s do not [o] take account of the estimation of THETA [o] [o] 2 x Log-likelihood = 4697. on 424 df [o] 2 x F u l l Log-likelihood = -2138. [o] [o] estimate s.e. parameter [o] 1 0.4007 0.1233 1 [o] 2 0.3839 0.04940 LV1 [o] 3 0.7044 0.05676 LV2 [o] scale parameter 1.000 [o] [i] ? $!METHOD OF MEAN CHI-SQUARE (NAG, 1996)! [i] ? $number theta=0 : mode=2$ [i] ? $use negbin theta mode $D E$ [w] model changed [w] -- model changed [o] scaled deviance = 370.24 (change = -1207.) at cycle 3 110 Appendix IV: GLIM Session for Different Negative Binomial Methods [o] r e s i d u a l df = 424 (change = 0 ) [o] [o] Mean Chi-squared estimate of THETA = 1.764 [o] [o] NOTE: standard errors of fi x e d e f f e c t s do not [o] take account of the estimation of THETA [o] [o] 2 x Log-likelihood = 4696. on 424 df [o] 2 x F u l l Log-likelihood = -2139. [o] [o] estimate s.e. parameter [o] 1 0.4030 0.1269 1 [o] 2 0.3827 0.05104 LV1 [o] 3 0.7058 0.05905 LV2 [o] scale parameter 1.000 [o] [i] ? $!METHOD OF MEAN DEVIANCE (NAG, 1996)! [i] ? $number theta=0 : mode=3$ [i] ? $use negbin theta mode $D E$ [w] -- model changed [w] -- model changed [o] scaled deviance = 424.00 (change = -1153.) at cycle 2 [o] r e s i d u a l df = 424 (change = 0 ) [o] [o] Mean Deviance estimate of THETA = 2.154 [o] [o] NOTE: standard errors of fi x e d e f f e c t s do not [o] take account of the estimation of THETA [o] [o] 2 x Log-likelihood = 4696. on 424 df [o] 2 x F u l l Log-likelihood = -2139. [o] [o] estimate s.e. parameter [o] 1 0.3991 0.1201 1 [o] 2 0.3850 0.04803 LV1 [o] 3 0.7023 0.05490 LV2 [o] scale parameter 1.000 [o] [i] ? $!TOTAL MODEL ESTIMATION OF NEGATIVE BINOMIAL DISTRIBUTION PARAMETERS! [i] ? $!FOLLOWING THE METHOD OF MOMENTS PROPOSED BY KULMALA(1995) AND MAYCOCK! [i] ? $! AND HALL (1984)! [i] ? $Units 427$ [i] ? $Da VI V2 Total Total_3yr Type$ [i] ? $Dinput 'unsigall.txt'$ [i] ? $Calc LVl=%log(Vl) : LV2=%log(V2)$ [i] ? $Yvar Total_3yr $Error P $Link L$ [i] ? $Fit LV1+LV2 $D E$ [o] scaled deviance = 1576.8 at cycle 4 [o] r e s i d u a l df = 424 [o] [o] estimate s.e. parameter [o] 1 0.3943 0.07586 1 [o] 2 0.4067 0.02814 LV1 [o] 3 0.6086 0.02636 LV2 111 Appendix IV: GLIM Session for Different Negative Binomial Methods [o] scale parameter 1.000 o] ii] ? $Number k=1.733$ !i] ? $MACRO NEGBIN! ii] $MAC? $Calc %va=%fv+(%fv**2)/k$ !i] $MAC? $Calc %di=2*(%yv*%log(%yv/%fv)-(%yv+k)*%log((%yv+k)/(%fv+k)))$ !i] $MAC? $ENDMAC$ !i] ? $Edit 74 Total_3yr 0.0001 : 118 Total_3yr 0.0001 : 228 Total_3yr 0001$ V] -- change to data values a f f e c t s model ;i] ? $Edit 238 Total_3yr 0.0001 : 375 Total_3yr 0.0001$ !i] ? $Yvar Total_3yr $Error Own NEGBIN $Link L$ !i] ? $Fit LV1+LV2 $D E$ ,o] deviance = 365.66 at cycle 5 [o] r e s i d u a l df = 424 o] ,o] estimate s.e. parameter o] 1 0.4033 0.1184 1 o] 2 0.3825 0.04765 LV1 O] 3 0.7061 0.05520 LV2 o] scale parameter 0.8624 o] i] ? $Number k=1.8474$ ,i] ? $Yvar Total_3yr $Error Own NEGBIN $Link L$ !w] -- model changed i] ? $Fit LV1+LV2 $D E$ o] deviance = 382.23 at cycle 5 o] r e s i d u a l df = 424 o] o] estimate s.e. parameter o] 1 0.4019 0.1190 1 o] 2 0.3832 0.04779 LV1 O] 3 0.7054 0.05513 LV2 o] scale parameter 0.9015 'o] ;i] ? $Number k=1.8472$ ;i] ? $Yvar Total_3yr $Error Own NEGBIN $Link L$ w] -- model changed !i] ? $Fit LV1+LV2 $D E$ o] deviance = 382.20 at cycle 5 [o] r e s i d u a l df = 424 'o] ,o] estimate s.e. parameter o] 1 0.4019 0.1190 1 !o] 2 0.3832 0.04779 LV1 o] 3 0.7054 0.05513 LV2 .o] scale parameter 0.9014 o] i] ? $Stop 112
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
China | 6 | 0 |
United States | 1 | 2 |
United Kingdom | 1 | 0 |
City | Views | Downloads |
---|---|---|
Beijing | 6 | 0 |
Mountain View | 1 | 2 |
Unknown | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Share to: