Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Accident prediction models for unsignalized intersections Rodríguez, Luis F. (Luis Felipe) 1998-12-31

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
[if-you-see-this-DO-NOT-CLICK]
ubc_1998-0177.pdf [ 3.92MB ]
[if-you-see-this-DO-NOT-CLICK]
Metadata
JSON: 1.0050154.json
JSON-LD: 1.0050154+ld.json
RDF/XML (Pretty): 1.0050154.xml
RDF/JSON: 1.0050154+rdf.json
Turtle: 1.0050154+rdf-turtle.txt
N-Triples: 1.0050154+rdf-ntriples.txt
Original Record: 1.0050154 +original-record.json
Full Text
1.0050154.txt
Citation
1.0050154.ris

Full Text

ACCIDENT PREDICTION MODELS FOR INTERSECTIONS  UNSIGNALIZED  by LUIS FELIPE  RODRIGUEZ  B.Sc. (Civil Engineering), Universidad de los Andes, Bogota, Colombia, 1990 M.Sc. (Civil Engineering), Universidad de los Andes, Bogota, Colombia, 1992  A THESIS SUBMITTED IN P A R T I A L F U L F I L M E N T OF THE REQUIREMENTS FOR THE D E G R E E OF M A S T E R OF APPLIED SCIENCE in THE F A C U L T Y OF G R A D U A T E STUDIES D E P A R T M E N T OF CIVIL ENGINEERING We accept this thesis as conforming to the required standard  T H E UNIVERSITY OF BRITISH C O L U M B I A APRIL, 1998 © Luis Felipe Rodriguez, 1998  In presenting this thesis in partial fulfillment of the requirements for an advanced degree at The University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that the permission for the extensive copying of this thesis for scholarly purpose may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  Department of Civil Engineering The University of British Columbia 2324 Main Mall Vancouver, B.C. Canada, V6T 1Z4  ABSTRACT  The main objective of this thesis is to develop Accident Prediction Models (APM) for estimating the safety potential of urban unsignalized (T and 4-leg) intersections in the Greater Vancouver Regional District (GVRD) and Vancouver Island on the basis of their traffic characteristics. The models are developed using the generalized linear regression modeling (GLIM) approach, which addresses and overcomes the shortcomings associated with the conventional linear regression approach. The safety predictions obtained from GLIM models can be refined using the Empirical Bayes' approach to provide, more accurate, site-specific safety estimates. The use of the complementary Empirical Bayes approach can significantly reduce the regression to the mean bias that is inherent in observed accident counts. The thesis made use of sample accident and traffic volume data corresponding to unsignalized (both T and 4-leg) intersections located in urban areas of the Greater Vancouver Regional District (GVRD) and Vancouver Island. The data included a total of 427 intersections located in the cities of Victoria, Surrey, Nanaimo, Coquitlam, Burnaby and Vancouver. The information available for each intersection included the total number of accidents in the 1993-1995 period, traffic volumes for both major and minor roads given in Average Annual Daily Traffic (AADT) and type of intersection (T or 4-leg). Four categories of models were developed in this study: (1) models for the total number of accidents; (2) separate models for T and 4-leg intersections; (3) separate models for different regions (Vancouver Island, the Lower Mainland and Surrey); and (4) a model for Surrey including intersection control. Five applications of APM were used in this thesis. Four of them relate to the use of the Empirical Bayes refinement: identification of accident-prone locations, developing critical accident frequency curves, ranking the identified accident-prone locations and before and after safety evaluation. The fifth application provides a safety-planning example, comparing the safety of a 4-leg intersection to two staggered T-intersections. These applications show the importance of implementing APM as a tool to assess in a reliable fashion traffic safety, and design different safety strategies.  n  TABLE OF  CONTENTS  ABSTRACT  ii  TABLE OF CONTENTS  iii  LIST OF TABLES  vi  LIST OF FIGURES  vii  ACKNOWLEDGEMENTS  x  CHAPTER I: INTRODUCTION  1  1.1 Background  1  1.2 Thesis Structure  2  CHAPTER II: LITERATURE REVIEW  4  2.0 Introduction  4  2.1 Shortcomings Associated with Conventional Linear Regression Models  4  2.2 Generalized Linear Models (GLIM)  6  2.3 Testing the Models Significance  10  2.4 Model Structure  12  2.5 Location Specific Prediction: The Empirical Bayes Refinement  13  2.6 Previous work  17  2.7 Conclusion  19  iii  CHAPTER III: DATA COLLECTION AND MODEL DEVELOPMENT  21  3.0 Introduction  21  3.1 Data Collection  21  3.1.1 Accident and Traffic Volume Data  21  3.1.2 Outlier Analysis  23  3.2 Model Development  26  3.2.1 Model for the Total Number of Accidents  27  3.2.2 Models for T and 4-leg Intersections  30  3.2.3 Regional Models  38  3.2.4 Effect of Intersection Control Type  44  3.3 Comparison with Previous Results  47  3.4 Conclusion  48  CHAPTER IV: STATISTICAL CONSIDERATIONS  50  4.0 Introduction  50  4.1 Poisson vs. Negative Binomial Distribution Error Structure  50  4.2 Approaches for Estimating the Negative Binomial Distribution Parameter K.  52  4.3 Conclusion  56  CHAPTER V: APPLICATIONS  57  5.0 Introduction  57  5.1 Empirical Bayes Refinement  57  5.2 Identification of Accident Prone Locations  61  5.3 Critical Accident Frequency Curves  64  iv  5.4 Ranking of Accident Prone Locations  74  5.5 Before and After Studies  77  5.6 Safety Comparison of Staggered T and 4-leg Intersections  78  5.7 Conclusion  81  CHAPTER VI: CONCLUSIONS AND RECOMMENDATIONS  82  6.1 Conclusions  82  6.2 Recommendations for further research  84  BIBLIOGRAPHY  86  APPENDIX I: R E S U L T S O F O U T L I E R S I D E N T I F I C A T I O N  90  APPENDIX II: P R E D I C T I O N R A T I O V S . A C C I D E N T F R E Q U E N C Y  97  APPENDIX III: G L I M S E S S I O N F O R E S T I M A T I N G A P M  APPENDIX IV: G L I M S E S S I O N F O R D I F F E R E N T N E G A T I V E B I N O M I A L  101  METHODS 108  LIST O F  TABLES  Table 3.1 Summary of Accident, Intersection Control and Traffic Volume Data  22  Table 3.2 Statistical Summary of Accidents  22  Table 3.3 Identification of Outliers for Total Model  26  Table 3.4 Model for the Total Number of Accidents  28  Table 3.5 Models for T and 4-leg Intersections  30  Table 3.6 Total Model Including the Effect of Intersection Type  34  Table 3.7 Intersection Type Model: 2 Separate Models vs Single Model  37  Table 3.8 Regional Models  39  Table 3.9 Surrey Total Model with Control Type  45  Table 4.1 Comparison between Poisson and Negative Binomial Distribution  51  Table 4.2 Results of Different Negative Binomial Methods in the Total Model  54  Table 5.1 Number of Accident Prone Locations  65  Table 5.2 Ranking of APLs for The Vancouver Island Model  75  Table 6.1 Summary of Accident Prediction Models  83  Table AI-1 Identification of Outliers for Total Model with Intersection Type  90  Table AI-2 Identification of Outliers for T-Intersection Model  91  Table AI-3 Identification of Outliers for 4-Leg Intersection Model  92  Table AI-4 Identification of Outliers for Vancouver Island Model  93  Table AI-5 Identification of Outliers for Lower Mainland Model  94  Table AI-6 Identification of Outliers for Surrey Total Model  95  Table AI-7 Identification of Outliers for Surrey Total Model with Intersection Control Type.... 96  vi  LIST O F  FIGURES  Figure 2.1 Empirical Bayes' Estimate for Different K Values  v  16  Figure 3.1 Identification of the Highest Cook's Distance Values for Total Model  26  Figure 3.2 Total Model: Observed vs. Predicted Number of Accidents  29  Figure 3.3 Total Model: Predicted Accidents vs Estimated Variance  29  Figure 3.4 T-Intersection Model: Observed vs. Predicted Number of Accidents  32  Figure 3.5 T-Intersection Model: Predicted Accidents vs Estimated Variance  32  Figure 3.6 4-leg Intersection Model: Observed vs. Predicted Number of Accidents  33  Figure 3.7 4-leg Intersection Model: Predicted Accidents vs Estimated Variance  33  Figure 3.8 Total Model with the Effect of Intersection Type: Observed vs. Predicted Number of Accidents  36  Figure 3.9 Total Model with the Effect of Intersection Type: Predicted Accidents vs Estimated Variance  36  Figure 3.10 Intersection Type Model: Separate vs. Single Models  38  Figure 3.11 Vancouver Island Model: Observed vs Predicted Number of Accidents  41  Figure 3.12 Vancouver Island Model: Predicted Accidents vs Estimated Variance  41  Figure 3.13 Lower Mainland Model: Observed vs Predicted Number of Accidents  42  Figure 3.14 Lower Mainland Model: Predicted Accidents vs Estimated Variance  42  Figure 3.15 Surrey Total Model: Observed vs Predicted Number of Accidents  43  Figure 3.16 Surrey Total Model: Predicted Accidents vs Estimated Variance  43  Figure 3.17 Comparison of Total Model with Regional Models  44  vii  Figure 3.18 Surrey Total Model with Control Type: Observed vs Predicted Number of Accidents 46 Figure 3.19 Surrey Total Model with Control Type: Predicted Accidents vs Estimated Variance 46 Figure 3.20 Comparison of T-Intersection model with Previous Studies  48  Figure 4.1 Predicted Accidents using Different Methods to Obtain K.  56  Figure 5.1 Predicted vs. EB Refined Number of Accidents for Total Model  60  Figure 5.2 Identification of Accident Prone Locations  64  Figure 5.3 Critical Curves for Total Model  66  Figure 5.4 Critical Curves for Vancouver Island Model  67  Figure 5.5 Critical Curves for Lower Mainland Model  68  Figure 5.6 Critical Curves for Different Values of K  70  Figure 5.7 Comparison of Critical Accidents for Different K Values  73  Figure 5.8 Ranking of Top 10 A P L for Island Model  76  Figure 5.9 Staggered T vs 4-leg Intersections Safety Comparison  80  Figure AI-1 Identification of the Highest Cook's Distance Values for Total Model with Intersection Type Figure AI-2 Identification of the Highest Cook's Distance Values for T-Intersection Model  90 91  Figure AI-3 Identification of the Highest Cook's Distance Values for 4-leg Intersection Model 92 Figure AI-4 Identification of the Highest Cook's Distance Values for Vancouver Island Model 93 Figure AI-5 Identification of the Highest Cook's Distance Values for Lower Mainland Model. 94 Figure AI-6 Identification of the Highest Cook's Distance Values for Surrey Total Model  95  Figure AI-7 Identification of the Highest Cook's Distance Values for Surrey Total Model with Intersection Control Type  96  Figure AII-1 A R vs. Accident Frequency for Total Model  97  Figure AII-2 A R vs. Accident Frequency for T-intersection Model  97  Figure AII-3 A R vs. Accident Frequency for 4-leg intersection Model  98  Figure AII-4 A R vs. Accident Frequency for Total Model Including Intersection Type  98  Figure AII-5 A R vs. Accident Frequency for Vancouver Island Model  99  Figure AII-6 A R vs. Accident Frequency for Lower Mainland Model  99  Figure AII-7 A R vs. Accident Frequency for Surrey Total Model  100  Figure AII-8 A R vs. Accident Frequency for Surrey Model with Control Type  100  ix  ACKNOWLEDGEMENTS  This thesis represents not only my last step toward an M.A.Sc. Degree, but also represents the fulfillment of a dream that I have had for many years, to obtain a graduate degree from a university outside my country. Many people helped me achieve this goal, which has been the most important achievement in my professional career. This has been a real teamwork comprised by my family, friends and the faculty staff, who in different ways supported me during these last two years.  First of all I would like to thank my supervisor, Dr. Tarek Sayed, for all the time he spent in explaining to me all the ideas and theory behind the phenomenon of traffic accident occurrence, and for encouraging me to keep the enthusiasm during the different stages of this research. I am also very grateful to him for having given me the opportunity to work as a research assistant in two of his projects.  I want to thank the Insurance Corporation of British Columbia (ICBC) for providing the financial support to this research and Delcan Corporation for providing the accident data used to develop the accident prediction models. Without them completing this thesis would have been impossible. I am also grateful to the Instituto Colombiano de Fomento para Estudios Tecnicos en el Exterior, ICETEX, for its financial support during my first year in Canada. I also thank my cousin Felipe Hernandez for the whole week that he spent in correcting my English and even sometimes my Spanish.  Last but not least, I want to dedicate this work and milestone to the persons that have given me the greatest support in my life, my father Luis and my mother Cecilia. Thank you guys for showing me the way to get where I am now. I have learned from you that despite all difficulties and sometimes frustrations, there are no impossible goals to reach. I feel very happy to inform you that "lo logre!".  x  CHAPTER I INTRODUCTION  1.1 Background Since the dawn of the automobile age about a century ago, traffic safety problems have been a serious concern: an enormous economic and human toll has been exacted as a result of the public's ongoing love affair with the motor vehicle. It is commonly accepted that there are many costs associated with vehicular mobility such as air pollution, noise, and accidents. However, the economic and social costs associated with road accidents greatly exceed other mobility costs due to the loss of property, injury, pain, grief and deaths attributed to road accidents.  In British Columbia, 500 people are killed and 50,000 injured as a result of road accidents. The annual direct claim costs for the Insurance Corporation of British Columbia (ICBC) due to road accidents are estimated to exceed $2 billion (ICBC, 1996 Annual Report), themselves far exceeded by their related social costs. Consequently, the importance of reducing the social and economic costs of road accidents can not be overstated.  Recognizing the traffic safety problem and the importance of reducing the frequency and severity of road accidents, the majority of road authorities have established Road Safety Improvement Programs (RSIPs). The objective of these programs is to identify accident-prone locations, determine possible causes and countermeasures, and to implement the most  effective  countermeasures in order to alleviate the problems at these locations. The success of these RSIPs can be enhanced by developing statistically reliable accident prediction models, which provide  1  Chapter I: Introduction  accurate estimates for the traffic safety at road sections and intersections. These safety estimates can be used in identifying accident prone locations and evaluating the effectiveness of remedial measures.  The main objective of this thesis is to develop accident prediction models for estimating the safety potential of urban unsignalized intersections as functions of traffic volumes on both major and minor roads, and type of intersection (T and 4-leg). The data used for this thesis included accident records and traffic volume data for intersections located in the G V R D and urban areas of Vancouver Island. The methodology used to derive these models is based on the Generalized Linear Regression Models (GLIM) approach. The G L I M approach addresses and overcomes the problems associated with conventional linear regression. Several researchers have shown that conventional linear regression lack the distributional property to describe the occurrence of accidents. Some of the potential applications of accident prediction models include: Identifying and ranking accident prone locations, before and after safety evaluation, and safety planning.  The work reported in this thesis is part of the ongoing research at the Civil Engineering Department of the University of British Columbia on accident prediction models. Models have been developed for urban signalized intersections (Feng and Sayed, 1997). Currently, models are being developed for rural signalized intersections, urban and rural corridors.  1.2 Thesis Structure This thesis is divided into six chapters. Chapter One provides an overview of the thesis and its structure. Chapter Two summarizes previous work on accident prediction models, the theoretical  2  Chapter I: Introduction  background of the G L I M approach, and its applications to accident prediction models. Chapter Three describes with the accident and traffic volume data used and the models developed. Chapter Four discusses several statistical issues related to the G L I M approach. Chapter Five discusses several applications of the models. The applications include: identification of accident prone locations; developing critical frequency curves, ranking of accident prone locations; before-and-after safety evaluation; and the use of the models in safety planning. Chapter Six provides suggestions for follow up work and the summary and conclusion of the thesis.  3  C H A P T E R II LITERATURE  REVIEW  2.0 Introduction The relationship between traffic accidents and traffic volumes has been the subject of numerous studies. Most of the earlier studies used the conventional linear regression approach to develop models relating accidents to traffic volumes. However, the past decade has seen a significant development and advances in accident data analysis and modeling. Accident prediction models are no longer limited to conventional linear regression approach, as more accurate and less restrictive nonlinear models are considered. In addition, the use of Empirical Bayes' approach for refining the estimates obtained from accident prediction models has also been an important development. This chapter describes the statistical theory behind the accident prediction models, as well as previous research and developments.  2.1 Shortcomings Associated with Conventional Linear Regression Models The conventional linear regression model is defined as follows:  k  i = 0+H j ij  Y  a  a  x  +  i  £  where, 7, = estimated or dependent variable a , a-, = estimated coefficients 0  Xy = independent variables 4  Chapter II: Literature Review  Si = estimated error, assumed to be normally distributed  Several researchers (Jovanis and Chang, 1986, Saccomanno and Buyco, 1988, Miaou and Lum, 1993) have shown that conventional linear regression models lack the distributional property to adequately describe random, discrete, non-negative, and typically sporadic events which are all characteristics of traffic accidents.  Jovanis and Chang (1986) identified three shortcomings associated with the assumption of a normal distribution error structure. The first shortcoming is found in the relationship between the mean and the variance of accident frequency. Jovanis and Chang (1986) demonstrated that as volume of traffic increases, so does the variance of accident frequency. Under a normal distribution assumption, the variance remains constant. The second shortcoming is associated with the non-negativity of accident occurrence. Predicted negative values under conventional linear models might occur when there exists low accident frequencies in the data set. A way to avoid this problem is by using non-linear models, which are linearized in a logarithm fashion in order to estimate their parameters. The third problem is related with the non-normality of the error distribution, due to the characteristics of non-negativity and small value of discrete dependent variable. Jovanis and Chang (1986) found that the best way to overcome these problems is to assume a Poisson distribution error structure. Their results were demonstrated by modeling accidents at highway sections in Indiana.  Miaou and Lum (1993) identified the same shortcomings when performing a comparison between four accident prediction models applied to trucks on highways. Two of these models  5  Chapter II: Literature Review  were developed under the assumption of normal distribution error structure, while the others were assumed to be Poisson distributed. It was found that predicted values, from models using the Poisson distribution assumption were much closer to the observed values and its estimated coefficients had higher t-statistics, which denote higher significance. For the normally distributed models, it was found that some of the estimated coefficients had signs contrary to the expectation. These results confirmed all the shortcomings associated with the conventional linear regression technique and its applicability for developing accident prediction models.  2.2 Generalized Linear Models (GLIM) As seen in the previous section, G L I M has the advantage of overcoming all the shortcomings associated with the conventional linear regression approaches. As well, G L I M has the flexibility of assuming different error distributions and link functions that allow the conversion of nonlinear models into linear models. Recognizing the advantages of the G L I M approach, it will be utilized in this thesis.  The G L I M approach used herein is based on the work of Kulmala (1995) and Hauer et-al, (1988). Assuming that Y is a random variable that describes the number of accidents at an intersection in a specific time period, and y is the observation of this variable during a period of time. The mean of Y is A which can also be regarded as a random variable. Then for A=A, Y is Poisson distributed with parameter X:  Xe y  P(Y = y\A = X)  -I  ; E(Y\A = 4  Var(Y\A = x) = X  6  (2.2)  Chapter II: Literature Review  Since each site has its own regional characteristics with a unique mean accident frequency A , Hauer et-al, (1988) have shown that for an imaginary group of sites with similar characteristics, A  follows a gamma distribution (with parameters  K  and  K/JJ),  where K is the shape parameter of  the distribution. That is:  r(K)  with a mean and variance of: .2 E(A) = //; Var(A) = ^—  (2.4)  K  Kulmala (1995) has also shown that the point probability function of Y based on equations (2.3) and (2.4) is given by the negative binomial distribution:  P(Y = y):  K  (2.5)  r{K)y\ \tc + juj  with an expected value and variance of: E(Y) = ju; Var(Y) = /u- tL  (2.6)  K  As shown in equation (2.6), the variance of observed accidents for the entire sample has two sources: the second term  (JJ/K)  from the variance of the predicted number of accidents, and the  first term (JJ) from the variation of the number of accidents (Kulmala, 1995). Notice that when  7  Chapter II: Literature Review  k—>co,  the variance of equation (2.6) equals the mean, which is identical to the Poisson  distribution.  As described earlier, for the G L I M approach, the error structure that best fits the accident occurrence is usually assumed to be Poisson or negative binomial. The main advantage of the Poisson error structure is the simplicity of the calculations, because the mean and variance are equal and its method for calculation is readily included in the G L I M software package ( N A G , 1994). However, this advantage is also a limitation. It has been shown (Kulmala and Roine, 1988, and Kulmala, 1995) that most accident data is likely to be overdispersed (the variance is greater than the mean) which indicate that the negative binomial distribution is the more realistic assumption.  Miaou and Lum (1993) identified three possible sources of overdispersion in accident data. The first is related to omitted variables that explain accident occurrence. Traffic accidents depend on numerous variables including geometric characteristics, weather, time of day, and human factors. Many of these variables are not discernible from accident records. The second possible source of overdispersion is related to uncertainties in vehicle exposure data, derived from error during collection of data.  The third source comes from non-homogeneous roadway environments,  which can explain why accident rates are different during daylight and night times or during rainy versus sunny days.  The main difficulty associated with using the negative binomial distribution error structure is the determination of the shape parameter k. Kulmala (1995) proposed an iterative approach using the  8  Chapter II: Literature Review  method of moments. The G L I M software package (V 4.0) includes a macro library in which the parameter k is calculated by three different iterative methods: the maximum likelihood, the mean deviance estimate, and the mean % estimate ( N A G , 1996). A comparison between the four 2  methods will be provided in Chapter 4. The method of the maximum likelihood is used in this thesis.  Bonneson and McCoy (1993) proposed a methodology to decide whether to use a Poisson or negative binomial error structure. First, the model parameters are estimated based on a Poisson distribution error structure. Secondly, a dispersion parameter (o~) is calculated. The dispersion d  parameter is defined as:  Pearson cr  d  %  2  (2.7)  = n — p  where n is the number of observations and p is the number of model parameters (The Pearson  %  test will be described in detail in next section).  If a is greater than 1.0, then the data have greater dispersion than is explained by the Poisson d  distribution, and a further analysis using a negative binomial distribution is required. If o~ is near d  1.0, then the assumed error structure approximately fits the Poisson distribution. This method has the advantage of testing the model under the Poisson distribution first, which is easier to estimate than the negative binomial distribution.  9  Chapter II: Literature Review  2.3 Testing the Models Significance The significance of G L I M models is usually assessed using the Scaled Deviance (SD) and the test. The SD is defined as the likelihood test ratios measuring the difference between  Pearson  the log likelihood of the studied model, and the saturated model (Kulmala, 1995). The general equation for SD is defined as follows:  SD = 21og/(y,y)-21og/(E(yl),y)  (2.8)  where log/(E(A),y) is the natural logarithm for the probability density function.  M c Cullagh and Nelder (1983) have shown that for the Poisson the SD is defined as: f  \  (2-9)  and for the negative binomial distribution the SD is defined as:  (  ,  A  yt + K  (2.10)  i=l  The scaled deviance is asymptotically  distributed with n-p-1 degrees of freedom. Therefore,  for a well-fitted model with appropriate link function, error distribution and functional form, the expected value of SD will approximately equal the number of degrees of freedom (Maycock and Hall, 1984)  10  Chapter II: Literature Review  Another measure to assess the significance of the G L I M models is the Pearson  statistic  defined as (Bonneson and McCoy, 1993):  (2.11)  where y is the observed number of accidents at intersection i, E(A) is the predicted number of t  accidents obtained from the accident prediction model, and Var(y) is the variance of the observed accidents defined in equation (2.2) and (2.6) for Poisson and negative binomial distributions, respectively. The Pearson  statistic follows the  distribution with n-p-1 degrees  of freedom, where n is the number of observations, and p is the number of model parameters.  In addition, useful subjective measures of the model goodness of fit are graphical methods. One of them is to plot the predicted accident frequency versus the observed accident frequency. A well fitted model should have all points in the graph clustered symmetrically around the 45° line. A second graphical method is to plot the average of squared residuals versus the predicted accident frequency. For a well fitted model, all points should be around the variance function line as defined in equation (2.6) for the negative binomial distribution.  Another graphical method is to calculate the Prediction Ratio (PR) and plot it against the predicted values. PR is defined as the normalized residual, which is the difference between the predicted and observed accidents, divided by the standard deviation (Bonneson and McCoy, 1997). PR can be calculated according to the following equation:  11  Chapter II: Literature Review  For a well fitted model Pi?,- should be clustered around the zero axis in a Predicted Accidents vs. PR graph.  Finally, the T-ratio test is used to measure the statistical significance of the variable coefficients. The t-ratio test is defined as the ratio between the estimated G L I M parameter and its standard error. For a significant variable at 95% level of confidence, the t-ratio should be greater than 1.96.  A l l six tests described in this section were used to access the significance of the models developed for this thesis.  2.4 Model Structure Intersection accident prediction models can be generally classified into two types. The first type relates accidents to the sum of traffic flows entering the intersection, while the second relates accidents to the product of traffic flows entering the intersection. The latter type has been shown to be more suitable to represent the relationships between accidents and traffic flows at intersections (Hauer et-al, 1988). In this kind of structure, accident frequency is a function of the product of traffic flows raised to a specific power (usually less than one). This approach has been used in this thesis. That is:  12  Chapter II: Literature Review  E(A)  = a  x V°  l  0  x  (2.13)  Vj  2  where, E(A)  predicted accident frequency  V,  major road traffic volume minor road traffic volume model parameters  As mentioned earlier, accident occurrence is not a function of traffic flows only, but also other variables (e.g. weather, intersection type, geometric features, etc.). Kulmala (1995) and Maher and Surnmersgill (1996) proposed to model these additional variables along with traffic flows as follows:  m E(A)  = a  0  x  x  (2.14)  x e  j  where Xj represents any of the m additional variables.  2.5 Location Specific Prediction: The Empirical Bayes Refinement There are two types of clues to the safety of a location: its traffic and road geometric design characteristics, and its historical accident data (Hauer, 1992, Briide and Larsson, 1988). The Empirical Bayes (EB) approach makes use of both clues. The E B approach is used to refine the estimate of the expected number of accidents at a location by combining the observed number of accidents at the location with the predicted number of accidents obtained from the G L I M model, to yield more accurate, location-specific safety estimate.  13  Chapter II: Literature Review  The E B estimated number of accidents for any intersection can be calculated by using the following equation (Hauer et-al, 1992):  safety estimate =  EB  + (1 - a) x count  ()  ax  E A  (2.15)  where, 1 Var(E(A)) 1+ E(A)  a  (2-16)  count = observed number of accidents E(A) = predicted number of accidents as estimated from the G L I M model Var(E(A)) = variance of the G L I M estimates  Using the variance of the predicted accidents, Var(E(A)), defined in equation (2.4), equation (2.15) can be rearranged to yield:  xE(A)+  safety estimate  KK + E(A))  ^ - L  \K  +  E(A).  x count  (2.17)  In addition, the variance of the E B refined estimate can be calculated using the following equation (Kulmala, 1995):  Var(EB j- y timate) sa  et  es  E(A) KK +  E(A),  ( XK  +  EV A\ E{A)  KK +  \  E(A)J  14  x count  (2-18)  Chapter II: Literature Review  Equation (2.17) shows that the E B refined estimate lies between the observed and the predicted number of accidents, combining both the individual accident history of the location and the G L I M model prediction (Figure 2.1).  The K parameter also plays an important role in the calculation of the E B estimate. Kulmala (1995) showed that for high values of K, the variance of the predicted accidents is low (equation 2.4), and therefore, there is a small uncertainty and the E B estimate is closer to the G L I M estimate. Conversely, when K is low, the variance of the predicted value is high as is the uncertainty of the G L I M model. Therefore, the E B estimate is closer to the observed value. Figure 2.1 shows how the K value affects the E B estimate.  15  Chapter II: Literature Review  12 11  Observed Number of Accidents (11 acc/3 years)  Predicted Number of Accidents (6.88 acc/3 years)  —i  10  1  1  1  1  1  1  1  1  20  30  40  50  60  70  80  90  100  k value  Figure 2.1 Empirical Bayes' Estimate for Different k Values In addition to combining the two types of safety clues and providing site-specific safety estimates, it has also been shown that the E B procedure significantly reduces the regression to the mean effects that are inherent in observed accidents count (Briide and Larsson, 1988). The regression to the mean is a statistical phenomenon by which a randomly large number of accidents for a certain entity during a before period, is normally followed by a reduced number of accidents during a similar after period, even if no measures have been implemented (while the opposite applies in the case of a randomly small number of accidents).  The E B refinement is important for various applications of G L I M models, such as identification and ranking of accident-prone locations, and assessment of effectiveness of safety measures. The  16  Chapter II: Literature Review  EB estimate combined with reliable G L I M models, has the advantage of overcoming the difficulties associated with defining reference groups to perform before and after studies (Mountain and Fawaz, 1996).  2.6 Previous work There are few studies dealing with accident prediction models at junctions, even though most of accidents occurs at these kind of locations.  Satterthwaite (1981), made an extensive review of over 80 studies dealing with the relationship between traffic accidents and traffic volumes. Most of models reported in this study consider accidents at road sections, and only 14 of the references reported deal with accident models at intersections.  For accident at intersections Satterthwaite (1981) found some non-linear relationships between accidents and traffic volumes at T-intersections located in rural areas. The proposed models are desegregated in accidents of vehicles turning left and right from the minor road (non-through road) to the major road (through road). The relationships found are similar to equation (2.13) but in one study it was found that the a, and a coefficients are approximated to 0.5, while in a 2  subsequent study a, is approximated to 1 while a is again approximated to 0.5. These models 2  were developed during the 50's and 60's and were estimated by using the conventional regression analysis.  17  Chapter II: Literature Review  Also reported were similar studies conducted at other intersections where traffic control and layout variables were taken into account, however there is no report related to accident prediction models at urban unsignalized intersections. At the conclusion of the study, it was found that results concerning accidents at intersections were not consistent and it was suggested that more research should be done.  Bonneson and McCoy (1993), using data from 125 two-way stop controlled intersections in Minnesota, developed the following model:  r  Accidents I year =  0.692  AADT  major  r o a d  \  ( AADT  x 0 minor  ^  1000  8  3  (2.19)  r o a d  1000  Using a similar approach, Belanger (1994) developed several models using data from 149 4-leg unsignalized intersections in western Quebec. The models included the "total-accidents model" for different ranges of speed; "accident-type models" such as right angle, rear end etc.; and models including other variables such as the existence of flashing beacons, sight distance and turning lanes. For instance, the total-accidents model for all speeds developed by Belanger is as follows:  Accidents I year = 0.00l93(AADT  major  roadf'^W^^minor  road?'  51  (- ) 2  20  Both Bonneson and McCoy and Belanger models were developed for intersections in rural areas assuming a negative binomial distribution error.  18  Chapter II: Literature Review  In a more recent study, Maher and Summersgill (1996), using selected data recorded all over the U K , developed the following model for T-intersections on urban single carriageways based on the negative binomial distribution:  r  Accidents I year = 0.049  AADT j  ma or  A  ( AADT  roa  1000  ^°'  36  minor  road  1000  (2.21)  In addition, Mountain and Fawaz (1996), using the same approach (negative binomial), derived a model for 390 unsignalized intersections located in 12 U K counties. Out of the 390 intersections, 338 were T-intersections and approximately 35% were located in urban areas. The model developed is as follows:  f AADT • ^°' f*^* major road  Accidents I year = 0.141  iooo  6  4  'AADTminor road\ v 1000 J r  0.24  (2.22)  Since this thesis deals only with intersections located in urban areas, only the models in equations (2.21) and (2.22) will be compared with the models developed in this thesis. This comparison is shown in next chapter.  2.7 Conclusion Developing accident prediction models has been a concern for the last four decades. During the 50's 60's, 70's, the models were limited by the use of the conventional linear regression analysis,  19  Chapter II: Literature Review  leading to inconsistencies and misinterpretation in describing traffic accidents occurrence. The advancements in computer and software technology during the last two decades, and the development of more sophisticated statistical tools, has resulted in the development and release of software packages such as G L I M and SAS, which are capable of solving non-linear regression models by specifying any type of error structure consistent with the data.  Several researchers have found that accident occurrences follow the negative binomial distribution, rather than the Poisson distribution, because it has been shown to be the most appropriate way to model overdispersion.  With respect to the use of G L I M to develop safety models for unsignalized intersections, only a few studies were found. Most of these studies deal with the rural environment. Most of G L I M accident prediction models have been developed during the last 10 years and are focused on signalized intersections, rural areas, and road sections. More work is needed in developing accident prediction models for urban unsignalized intersections.  20  CHAPTER  III  DATA COLLECTION AND MODEL  DEVELOPMENT  3.0 Introduction This chapter is divided into three sections. The first section contains a detailed description of the data used to develop the accident prediction models. It also includes a procedure to identify outliers which may affect the quality of the models. The second section describes the models developed and their goodness of fit. Finally, the third section shows a comparison between the developed models and similar models found in the literature.  3.1 Data Collection This thesis made use of sample accident and traffic volume data corresponding to unsignalized (both T and 4-leg) intersections located in urban areas of the Greater Vancouver area and the Vancouver Island.  3.1.1 Accident and Traffic Volume Data Three years of accident data was available for analysis on each intersection (1993-1995). The source of the accident data is the M V 104 accident reporting form, British Columbia's accident police report. The data set contained 427 intersections from the cities of Surrey, Victoria, Coquitlam, Vancouver, Burnaby and Nanaimo. The information available for each intersection includes the total number of accidents that occurred during the 1993-1995 period. The explanatory variables of accident occurrence included the traffic volumes on the both the major  21  Chapter III: Data Collection and Model Development  and minor roads given in Average Annual Daily Traffic (AADT), and the type of intersection (T or 4-leg).  Another explanatory variable taken into account for this thesis, is the type of intersection control, which was only available for Surrey intersections. Traffic control types included 2-way Stop, 4way Stop, and one-way Stop at T-intersections. Tables 3.1 and 3.2 provide a statistical summary of the data.  City  Number of Intersections Total T 4-leg  Number of Accidents Acc/year Acc/yr/Int.  Average AADT Major Road Minor Road  Surrey Victoria Nanaimo Coquitlam Burnaby Vancouver  56 340 10 8 9 4  18 162 0 2 3 1  38 178 10 6 6 3  285 360 23 34 36 17  5.08 1.06 2.25 4.25 4.04 4.33  17,937 12,355 7,172 10,004 12,984 22,191  3,075 1,494 3,242 1,514 2,837 1,408  Total  427  186  241  755  1.77  13,186  1,770  Table 3.1 Summary of Accident, ntersection Control and Traffic Volume Data City  Number of Accidents* Max. Min. Std. Dev.  AADT minor Road Max. Min. Std. Dev.  AADT Major Road Max Min Std Dev  Surrey Victoria Nanaimo Coquitlam Burnaby Vancouver  11.0 8.3 • 4.8 8.7 10.3 8.3  1.7 0.0 0.6 1.0 0.3 0.3  2.3 1.2 1.5 2.7 3.0 3.5  9,300 11,000 6,025 2,360 7,415 2,550  500 100 1,968 730 365 860  2,060 1,483 1,307 542 2,252 775  42,600 47,800 15,739 32,310 29,020 37,295  2,100 500 2,771 730 5,715 7,835  10,385 9,397 4,132 10,642 7,492 12,070  Total  11.0  0.0  2.1  11,000  100  1,673  47,800  500  9,673  * Indicates average annual accidents per intersection  Table 3.2 Statistical Summary of Accidents As shown in Table 3.1, the average number of accidents per intersection for the cities located in the Lower Mainland (Surrey, Coquitlam, Burnaby and Vancouver) is much higher than the average of number of accidents per intersection for the cities located in Vancouver Island (Victoria and Nanaimo). About 44% of the intersection are T-intersections, while the rest 56%  22  Chapter III: Data Collection and Model Development  are 4-leg intersection. This indicates that there is not an absolute predominance of either one of the intersection types in the database, unlike the studies made by Mountain and Fawaz (1996) and Maher and Summersgill (1996), where their data set included mainly T-intersections. This condition is desirable when developing a total model of accidents as the model will not be biased in favor of one of the intersection types.  As previously mentioned, intersection control type data is available only for Surrey intersections. Of the 56 intersections, 32 are two-way Stop controlled, 8 are 4-way Stop controlled, and the remainders 16 intersection are classified as one-way Stop-T intersections.  3.1.2 Outlier Analysis Outliers are defined as data points that split off or are very different from the rest of the data (Stevens, 1986). Outliers can be caused by irregularities or errors occurred during the data recording or observation process or when the data is genuinely different from the rest. These points deserve further investigation in order to decide whether or not to remove them.  Kulmala (1995) proposed a procedure to identify outliers based on the calculation of the leverage statistic. The leverage of a point is a measure of how far the x-value of the point is away from the average of the rest of the x-yalues (NAG, 1994). The leverage values are the diagonal elements of the hat matrix, which is the matrix that multiplies the observed vector in order to yield the predicted vector. One of the properties of the leverage values, h is that the sum over the nh  values, yields the number of parameters, p, in the model. According to this statement the average  23  Chapter 111: Data Collection and Model Development  value of the leverage is p/n, and many authors (NAG, 1994, Stevens, 1988) consider that a high leverage is one that exceeds 2p/n, and should be subject to further examination.  However, it has been shown (NAG, 1994) that the leverage alone is not a good indication of whether the parameters estimate is being affected by specific observations. A measure which does this is the Cook's distance (NAG, 1994). The Cook's distance measures the influence of observations on the model. The higher the Cook's distance value for a given observation, the stronger its influence on the model. The Cook's distance is calculated as follows:  < * = — M H  (3-D  2  where, h  = leverage value  p  = number of parameters  t  r  t  = standardized residual  The main disadvantage of using the Cook's distance is that there is no clear rule for what constitute a high c,. N A G (1994) proposes to sort the data according to the Cook's distance values, and in a stepwise procedure, remove the points with the highest values, and for every point removed, assess the change in the scaled deviance.  Maycock and Hall (1984) have found that the difference in scaled deviance in two models with degrees of freedom df, and df , is % distributed with parameters (df, - df ). This means that if 2  2  2  24  Chapter III: Data Collection and Model Development  only one point with a high Cook's distance is removed, then the difference in the scaled deviance must be greater than 3.8 (the % value for 95% level of confidence and 1 degree of freedom). 2  G L I M has the capacity of extracting both leverage and Cook distance values, from each model. The procedure to identify outliers in the models developed in this thesis is to visually examine the relationship between the observed number of accidents for each intersection and the Cook's distance. Intersections with exceptionally large values of c, are then removed and the change in scaled deviance is determined. If this change is significant the intersections are removed.  The previous analysis was performed to all models of this thesis. After the analysis none of the critical points were classified as outliers that should be removed. Figure 3.1 and Table 3.3 show the results of this procedure for the total accident model. From visual examination of Figure 3.1 it was determined to select five intersections for removal (Cook's Distance greater than 0.02 and the intersections are tagged 1 through 5 in the figure). As shown in Table 3.3, the cumulative drop in scaled deviance is always below the x statistics. This indicates that removing these 2  intersections from the data set is not warranted. The analysis summarizing the results for the remaining models is shown in Appendix I.  25  Chapter III: Data Collection and Model Development  0.06 A Denotes High Cook's Distance 0.05 1 to CD 0.04 _g CO  A  > CD  A  3  2  O 0.03 c  A  B  w  w Ik o o  4  A  b  5  0.02  • • •  O 0.01  • , I  o.oo  I B° • g •  ° n  B  |  Q  B•  B  • • •  10  D  a  D  D  • •• „  D  °  | i l t l & i o ^ w U ° 5 HO- • o  0  • • •  15  •  -D—i—°-  20  30  25  35  Observed Accidents (acc/3 years) Figure 3.1 Identification of the Highest Cook's Distance Values for Total Model  Rank Cook's Intersection Number Distance 1 2 3 4 5  14 22 157 5 33  Sample Size  Scaled Dev.  SD Drop  Cumulative SD Drop  x  426 425 424 423 422  397.4 395.3 393.2 391.0 389.0  1.5 2.1 2.1 2.2 2.0  1.5 3.5 5.7 7.8 9.8  3.8 6.0 7.8 9.5 11.1  2  Table 3.3 Identification of Outliers for Total Model  3.2 Model Development The main task of this research is to develop multivariate models to estimate the predicted number of accidents. Four categories of models were developed in this thesis: (1) models for the total number of accidents; models for T and 4-leg intersections; (3) separate models for every region  26  Chapter III: Data Collection and Model Development  (Vancouver Island, the Lower Mainland, and Surrey); and (4) a model for Surrey including intersection control type.  Since the average number of accidents per year per intersections is relatively small (especially for intersections in Victoria and Nanaimo), it was decided to use the number of accidents in a three year period.  The models developed are assumed to follow the negative binomial distribution, which is included in the G L I M software package, through a macro designed by N A G (1996). Out of the six goodness of fit tests described in section 2.3, the graphs describing the predicted accidents vs. the Prediction Ratio for every model are shown in Appendix II, while the rest of tests are shown with the description of each model. In general, the Prediction Ratio graphs show similar dispersions as the ones obtained for the observed vs. predicted accident graphs. Appendix III shows the G L I M output of all models, which in addition to the models' parameters, includes the scaled deviance, the K value (represented by T H E T A in the G L I M output), and the standard error of the parameters. Note that the model under Poisson distribution assumption is developed first.  3.2.1 Model for the Total Number of Accidents A model relating the total number of accidents to the traffic volumes for minor and major roads was developed. The whole data set is used for this analysis and Table 3.4 shows the parameter estimates of the model and its goodness of fit.  27  Chapter III: Data Collection and Model Development  t-ratio  Model Form a a, a. 0  V  1000  )  V  1000  )  3.2 7.8 12.4  SD  (dot) 399 (424)  K  Pearson x (X test)* 459 (472)  2  2  1.97  * Denotes significance at a 95-percent confidence level  Table 3.4 Model for the Total Number of Accidents The Pearson % indicates significance at the 5% confidence level. The t-ratios are significant for 2  all the variables included in the model, and the scaled deviance value is smaller than the number of degrees of freedom. Figure 3.2 shows the relationship between the observed and predicted number of accidents for the model. The results are symmetrically clustered around the 45° line to a reasonable extent, which is desirable. In addition, Figure 3.3 shows the fit of the variance of the observed accidents (assuming a negative binomial distribution) to the average squared residuals. Each point represents the average of predicted accident frequency for a sequenced group of intersections (e.g. the first twenty intersections sorted by predicted accident frequency). The figure shows a reasonably good fit.  28  Chapter III: Data Collection and Model Developmt  Observed Accidents (acc/ 3 yrs)  Figure 3.2 Total Model: Observed vs. Predicted Number of Accidents 400  ,  n  350 -I  Predicted Accidents (acc /3 yrs)  Figure 3.3 Total Model: Predicted Accidents vs Estimated Variance  29  Chapter III: Data Collection and Model Development  Figure 3.2 also indicates that intersections in the Lower Mainland are generally different from those in Vancouver Island. Therefore, separate models for the Lower Mainland and Vancouver Island intersections should be developed.  3.2.2 Models for T and 4-leg Intersections There are two ways to approach to these kinds of models. The first is to develop separate models for T and 4-leg intersections. Alternatively, one model can be developed using the entire sample size as the total model with the intersection type variable (T or 4-leg intersection) included within the model.  Using the first approach, the sample size for T intersections is 186, and for 4-leg intersections is 241. Table 3.5 shows the parameter estimates for each model, as well as the different goodness of fit test. Both models have a relatively good fit with respect to the scaled deviance, and the %  2  values are significant at the 95% confidence level. The t-test ratios for all the independent variables are significant, which indicates that the models are more dependent on the explanatory variables rather than a constant coefficient, which is also desirable.  Model Form  t-ratio  T-intersection model 1,  -0.3 5.5 7.4  a a, 2  3.6 6.8 9.1  0  1000  J  V  1000  J  4-leg intersection model (AADT {  a a, a.  1000  0  • A 0  J  4 0 9 9  \0.1065  fAAnr  \  1000  a  J  SD (dof) 164 (183) 230 (238)  * Denotes significance at a 95-percent confidence level  Table 3.5 Models for T and 4-leg Intersections  30  K  Pearson x (X test)* 205 (214) 2  2.34  2.17  251 (274)  2  Chapter III: Data Collection and Model Development  Figures 3.4 through 3.7 show the relationships between the observed and the predicted number of accidents, and the fit of the variance of the observed accidents to the average squared residuals, for both models. The results for both models are symmetrically clustered around the 45° line and the average squared residuals fits the variance equation well.  31  Chapter III: Data Collection and Model Development  Observed Accidents (acc/ 3 yrs)  Figure 3.4 T-lntersection Model: Observed vs. Predicted Number of Accidents 200 -,  ,  Predicted Accidents (acc /3 yrs)  Figure 3.5 T-lntersection Model: Predicted Accidents vs Estimated Variance  32  Chapter III: Data Collection and Model Development  o Surrey o Victoria A Coquitlam • Vancouver • Burnaby • Nanaimo 0  5  10  15  20  25  30  35  40,  Observed Accidents (acc/ 3 yrs)  Figure 3.6 4-leg Intersection Model: Observed vs. Predicted Number of Accidents 500  1  -,  Predicted Accidents (acc /3 yrs)  Figure 3.7 4-leg Intersection Model: Predicted Accidents vs Estimated Variance  33  Chapter III: Data Collection and Model Development  Using the second approach a total model with the effect of intersection type model was developed by using only one equation that includes the effect of both T and 4-leg intersections. This model follows the same structure described in equation (2.14). Table 3.6 shows the estimates results of this model as well the goodness of fit test. The variable Type, which indicates the intersection type, has two values: 1 for T-intersections and 2 for 4-leg intersections. According to the results described in Table 3.6, all variables are significant in the model, the scaled deviance is also closed to the degrees of freedom, and the Pearson % test indicates a 2  significance at the 95% confidence level.  t-ratio  Model Form Total Model with Intersection Type fJJIlT "\ 0.4221 Acc 12yrs = 0.5116^ \ J V 1000 ) \ D m a i r d  a a, ^19,Ty e a, b, a  \ 0.6480  ^">) 1000 >  AADT  xe  P  -2.8 8.7 11.7 6.3  SD (dof) 394 (423)  K  Pearson % (l test)* 449 (471)  2  2  2.23  * Denotes significance at a 95-percent confidence level  Table 3.6 Total Model Including the Effect of Intersection Type  A brief comparison between this model and the total model developed in section 3.2.1 shows a smaller scaled deviance and Pearson % for the first model. Note that decreasing the degrees of 2  freedom by 1, lead to a drop in scaled deviance of 5, which is greater than 3.8 (the 95-percent value of the % square distribution with 1 degree of freedom). This indicates the importance of 2  including in the model as many explanatory variables as possible in order to get a better fit.  Figures 3.8 and 3.9 show the relationships between the observed and the predicted number of accidents, and the fit of the observed accidents variance to the average squared residuals,  34  Chapter III: Data Collection and Model Development  respectively. The results in Figure 3.8 show that the points are closer to the 45° line than the results displayed in Figure 3.3 (Total Model). Figure 3.9 also shows the tendency of the model's average squared residual to follow the variance equation.  35  Chapter III: Data Collection and Model Development  0  5  10  15 20 25 30 35 Observed Accidents (acc/ 3 yrs)  40  45  50  Figure 3.8 Total Model with the Effect of Intersection Type: Observed vs. Predicted Number of Accidents 350 -.  0  ,  5  10  15  20  25  Predicted Accidents (acc /3 yrs)  gure 3.9 Total Model with the Effect of Intersection Type: Predicted Accidents vs Estimated Variance  36  Chapter III: Data Collection and Model Development  A comparative analysis between the two approaches developed in this section for intersection type model is shown in Table 3.7. The analysis made use of the entire sample of this study (427 intersections), and the number of predicted accidents was calculated by using the separate models and the total model including the intersection type. The results obtained in this analysis shows that using the single model provides a slightly lower Pearson % test, and sum-of-squared error. z  This indicates that the total model with intersection type variables performs slightly better than using the two separate models. However the difference between both approaches is not significant, since the single model fits better only for 51% of the data.  Parameters Pearson % X test _ _! Sum of Error Closer Estimates 2  0 0 5 jn  p  2  Separate Models 456 472 12,117 208 (49%)  Single Model 449 471 11,882 219(51%)  Table 3.7 Intersection Type Model: 2 Separate Models vs Single Model Figure 3.10 shows the predicted accidents as a function of major road traffic volume, for both approaches. This figure shows that for 4-leg intersections, the separate model curve is slightly above the single model curve. For T-intersections, at low major road traffic volumes (AADT<12,000 veh/day), both curves are practically the same, but at high traffic volume the separate model curve is also slightly above the single model curve.  The results of the comparison show that using a single model seems to be slightly accurate than using separate models. The differences between these two approaches are relatively small.  37  Chapter III: Data Collection and Model Development  Therefore, using either single or separate models will yield practically the same results, and both approaches are valid.  16 14  0  5  10  15  20  25  30  35  40  Major Road AADT (thousands) -e— Single T-Int - B — Single 44eg - A - Separate T-Int -*— Separate 44eg  Figure 3.10 Intersection Type Model: Separate vs. Single Models'  In addition, Figure 3.10 shows that T-intersections are approximately 50% safer than 4-leg intersections. A more detailed analysis regarding this point will be introduced in Chapter 5.  3.2.3  Regional  Models  As described earlier, intersections in the Lower Mainland are generally different from those in Vancouver Island. Therefore, three regional models were developed: (1) a model for the Lower Mainland which comprises intersections located in the cities of Surrey, Coquitlam, Vancouver  38  Chapter III: Data Collection and Model Development  and Burnaby; (2) a model for the Vancouver Island which comprises intersections located in Victoria and Nanaimo; and (3) a model for Surrey. It was decided to develop a model for Surrey because it has the highest average number of accidents per intersection.  The sample size are 77, 350, and 56 for the Lower Mainland, Vancouver Island and Surrey models respectively. Table 3.8 shows the results of each model. For all models, the Pearson x  2  values indicate significance at the 95% confidence level. The scaled deviance is also smaller than the degrees of freedom. The t-ratios are significant at 95% confidence level for all parameters, except for the major road traffic volume for the Surrey model, which is significant only at the 90% confidence level. Therefore, it is suggested that a larger sample size be used for the Surrey model or more explanatory variables should be added in order to obtain a more reliable model.  Model Form  t-ratio  Vancouver Island model 1.  )  \  1000  1.  J  \  1000  J  Surrey model  7.6 2.4 3.7  81 (74)  6.27  76 (94)  a a,  8.7 1.8 2.3  56 (53)  8.89  58 (70)  0  , ,„ (AADT ) (AADT A° Acc/3 yrs = 8.441 x -— x \ 1000 ) \ 1000 ) Denotes significance at a 95-percent confidence level  mi  0A5]6  maJrd  2  2  a a, a. a  1000  Pearson % (X test)* 383 (390)  2.6 6.0 9.0  )  Lower Mainland model  K  a a, a  1000  SD (dof) 302 (347)  minr  2  a  2.92  Table 3.8 Regional Models  Figures 3.11 through 3.16 show the relationships between the observed and the predicted number of accidents, and the fit of the variance of the observed accidents to the average squared residuals. The results for Vancouver Island and Mainland models (Figures 3.11 through 3.14) are symmetrically clustered around the 45° line and the average squared residuals follow the variance  39  Chapter III: Data Collection and Model Development  to a satisfactory extent. For Surrey model, the results shown in Figures 3.15 and 3.16, indicates a larger dispersion, which confirms the need either use a larger sample or add more explanatory variables to the model.  40  Chapter III: Data Collection and Model Development  Observed Accidents (acc/ 3 yrs)  Figure 3.11 Vancouver Island Model: Observed vs Predicted Number of Accidents 90  :  -i  0  2  4  6  8  10  12  14  Predicted Accidents (acc /3 yrs)  Figure 3.12 Vancouver Island Model: Predicted Accidents vs Estimated Variance  41  Chapter III: Data Collection and Model Development  0  5  10 15 20 25 30 Observed Accidents (acc/ 3 yrs)  35  40  Figure 3.13 Lower Mainland Model: Observed vs Predicted Number of Accidents 140 -,  0  .  5  10  15  ,  20  25  Predicted Accidents (acc/3 yrs)  Figure 3.14 Lower Mainland Model: Predicted Accidents vs Estimated Variance  42  Chapter III: Data Collection and Model Development  0  5  10  15 20 25 30 Observed Accidents (acc/ 3 yrs)  35  40  Figure 3.15 Surrey Total Model: Observed vs Predicted Number of Accidents 140 -,  0  ,  5  10  15  20  25  30  Predicted Accidents (acc /3 yrs)  Figure 3.16 Surrey Total Model: Predicted Accidents vs Estimated Variance  43  Chapter III: Data Collection and Model Development  Figure 3.17 shows a comparison of the total model estimated in section 3.2.1 with the three regional models. It should be noted that the total model lies between the Vancouver Island and the Lower Mainland models, which is expected. The total model curve is closer to the Vancouver Island model because more than 80% of the data comes from the cities of Victoria and Nanaimo.  10  15  20  25  30  35  40  Major Road AADT (thousands) -©—Total -a-Surrey  Vancouver Island -*— Lower Mainland  Figure 3.17 Comparison of Total Model with Regional Models  3.2.4 Effect of Intersection Control Type Since data on intersection control type was only available for Surrey intersections, a Surrey total model including control type was estimated. As mentioned earlier, of the 56 intersections for Surrey data, 32 are classified as 2-way controlled, 8 as 4-way controlled and the remainders 16 as  44  Chapter III: Data Collection and Model Development  one-way stop controlled T intersections. The type variable in the equation is denoted by 1, 2 and 3 respectively for each control type.  Table 3.9 shows the results of this model. It can be noted that the a parameter is much more 0  significant than the three variables included in the model. This parameter also has a relatively high value. This is considered a deficiency since it indicates that the number of accidents is less dependent on traffic volumes and the control type. The t-ratio for the control type is not significant. As in the previous Surrey total model, it is therefore suggested that a larger size be used to develop this model.  Model Form (AADT • A Acc/3yrs-S.&906x\ " \ V 1000 J  t-ratio  (AAHT N 0.2256 a x[ ^\ ,-0.06994*00^ a, V 1000 J a, b] * Denotes significance at a 95-percent confidence level 0A64S  m  Jrd  a  A A D T  x  8.8 2.0 2.6 -1.0  SD (dot) 55 (52)  K  Pearson % (X test)*  2  2  9.01  58 (69)  Table 3.9 Surrey Total Model with Control Type Figures 3.18 and 3.19 show the relationship between the observed and the predicted number of accidents, and the fit of the variance of the observed accidents to the average squared residuals. In both figures the points are dispersed around the lines, which indicates a relatively poor fit.  45  Chapter III: Data Collection and Model Development  Observed Accidents (acc/ 3 yrs)  Figure 3.18 Surrey Total Model with Control Type: Observed vs Predicted Number of Accidents 140 -,  0  ,  5  10  15  20  25  30  Predicted Accidents (acc /3 yrs)  Figure 3.19 Surrey Total Model with Control Type: Predicted Accidents vs Estimated Variance  46  Chapter III: Data Collection and Model Development  3.3 Comparison with Previous Results As mentioned in Chapter 2, there are few studies which developed accident prediction models for urban unsignalized intersections. Therefore, this section will only compare the models developed herein to those developed by Maher and Summersgill (1996), and Mountain and Fawaz (1996). Since the dada base used to obtain these models comprised mainly of T-intersections, then the comparison was performed on the separate T-intersection model described in section 3.2.2.  Figure 3.20 shows the results of these three models for a constant minor traffic volume of 2,000 vehicles per day. The T-intersection model developed in this thesis has higher frequencies than the other two models. The difference in results may be attributed to the fact that Maher and Summmersgiir model included only T-intersections on urban single carriageways while Mountain and Fawaz's model include both urban and rural intersections. As well, there are differences in regional characteristics and the accident reporting practice between the U K and British Columbia (different reporting limit, police attendance, etc.)  47  Chapter III: Data Collection and Model Development  C O  "13  o  •53  c0 -g o o  <  & O T5 0  10  15  20  25  30  35  40  Major Road AADT (thousands) -e— T-lntersection Model - B — Maher and Summersgill  Mountain and Fawaz  Figure 3.20 Comparison of T-Intersection model with Previous Studies 3.4 Conclusion Using the negative binomial distribution approach eight different accident prediction models were developed. The first model developed included the entire data set and related accident frequency with traffic volumes for the major and minor roads. The rest of the models were classified according to certain characteristics such as intersection type (T and 4-leg intersections), regional characteristics, and intersection control type.  According to the various quality tests performed in this chapter, six out of the eight models showed a good statistical fit. The two models that showed poor fit, were characterized by having  48  Chapter III: Data Collection and Model Development  a lower sample size. It was suggested to increase the sample size or to include more explanatory variables into the models.  For the intersection type model, two different approaches were utilized: (1) by developing two separate models for each intersection type and; (2) by developing a single model that includes the intersection type as one of the variables. The differences between these two models were relatively small, and the effect of intersection type can be measured by using either approach.  Finally, a procedure to identify outliers in the data set was performed according to the Cook's distance values. The procedure indicated that there were no outliers in the data.  49  C H A P T E R IV STATISTICAL CONSIDERATIONS  4.0 Introduction In this chapter several statistical issues will be discussed. The first issue relates to the error structure distribution. As described earlier, for the G L I M approach, the error structure is usually assumed to be Poisson or negative binomial. A comparison will be made between the two error structure distributions. The second issue relates to the method of calculating the parameter K of the negative binomial distribution. A comparison of several approaches to calculate K will be presented.  4.1 Poisson vs. Negative Binomial Distribution Error Structure As mentioned in Chapter Two, dispersion parameters (cr , defined in equation 2.7) can be used to d  decide whether to use the Poisson or the negative binomial distribution error structure. If the dispersion parameter in the Poisson distribution model is greater than one, then going for the negative binomial distribution is recommended.  The Poisson distribution was used as a first step to develop all eight models discussed in Chapter 3. Appendix 2 shows the G L I M session results of the Poisson distribution. Table 4.1 shows a comparative analysis between these two approaches. Note that the dispersion parameter for the Poisson distribution is considerable high for all models, ranging from 2.57 for the Vancouver Island model, to 4.38 for the 4-leg intersection model. These high values are explained, by the lack of significance of the Pearson  tests. This indicates that for all models the data has greater 50  Chapter IV: Statistical Considerations  dispersion than can be explained by the Poisson distribution, and it is necessary to assume a negative binomial distribution error structure. Under the latter distribution, a ranges from 1.02 d  for the Lower Mainland model, to 1.12 for T-intersection and Surrey control type models. This indicates that the data dispersion is satisfactorily explained by the negative binomial distribution.  Poisson Neg bin Poisson Neg bin Total Model Total Intersection Type  PARAMETERS a a, a. b, Dispersion Parameter, <r a  (l  K Scaled Deviance Deg. of Freedom  1.4833 0.4067 0.6086  1.4929 0.3839 0.7044  4.30  1.08 1.97 399 424 459 472 12996 56%  1577 424 1823 472 12494 44%  Pearson %  r(95%) Error Closer Estimates 2  Vancouver Island a a, a, 0  b, Dispersion Parameter, u  d  K Scaled Deviance Deg. of Freedom Pearson %  r(95%) Error Closer Estimates 2  0.5906 0.4336 0.5937 0.5268 3.86  Poisson Neg bin T-lntersection  0.5776 0.4221 0.6480 0.5379 1.06 2.2 394 423 449 471 11883 56%  1436 423 1634 471 11687 44%  0.6717 0.5809 0.5902  0.9333 0.4531 0.5856  1.9007 0.3884 0.5944  1.6947 0.4099 0.7065  3.26  1.12 2.35 164 183 205 214 3347 48%  4.38  1.06 2.17 230 238 251 274 8770 57%  485 183 597 214 3267 52%  Lower Mainland  Surrey Total  1.3807 0.3042 0.5488  6.7666 0.2036 0.2474  6.5929 0.2011 0.2864  8.5677 0.1529 0.1720  8.4401 0.1516 0.1907  2.57  1.10 2.92 302 347 383 390 3892 58%  3.35  1.02 6.27 81 74 76 94 3681 56%  2.97  1.10 8.89 56 53 58 70 2440 59%  250 74 248 94 3633 44%  152 53 158 70 2431 41%  942 238 1042 274 8272 43%  Surrey Control Type  1.3327 0.3231 0.5240  734 347 893 390 3879 42%  Poisson Neg bin 4-Leg Intersection  8.9442 8.8906 0.1647 0.1645 0.1958 0.2256 -0.0570 -0.0699 1.12 3.00 9.10 150 55 52 52 156 58 69 69 2417 2438 52% 48%  Table 4.1 Comparison between Poisson and Negative Binomial Distribu ion In addition to the dispersion parameter, Table 4.1 also shows other parameters to compare both model approaches such as the scaled deviance, Pearson predicted accidents closer to the observed accidents.  51  x , error squared and the share of the 2  Chapter IV: Statistical Considerations  The scaled deviance in the Poisson distribution is considerably greater for all models and exceeds the number of degrees of freedom from 112% for the Vancouver Island model, to 296% for 4-leg model. In contrast, for the negative binomial distribution models, the scaled deviance is relatively close to the degrees of freedom, which indicates a reasonably good fit.  Regarding the other comparative tests such as the sum of error squared and the closer predicted values, Table 4.1 shows that the sum of error squared is slightly smaller in the Poisson models than in the negative binomial models. However, for this latter assumption, there are more predicted values closer to the observed data. This indicates that, while most of the data fits the negative binomial distribution model better, the estimates that fit the Poisson distribution better have higher differences with the observed values when using negative binomial distribution models.  4.2 Approaches for Estimating the Negative Binomial Distribution Parameter K There are several approaches to estimate the parameter K of the negative binomial distribution error (Famoye, 1997). The macro library of the G L I M software package contains three methods: maximum likelihood and two methods of moments called mean x  2  and mean deviance. In  addition Kulmala (1995), following Maycock and Hall (1984), proposed a method of moments, in which the parameter K is initially calculated from the estimates obtained from the Poisson distribution model. A l l these methods are iterative.  The method of maximum likelihood has been the most widely used (Hauer et-al, 1988, Bonneson and McCoy, 1993, Maher and Summersgill, 1996). According to Lawless (1987) this method is  52  Chapter IV: Statistical Considerations  based on the log-likelihood function, which is the natural logarithm of the joint probability function of the negative binomial distribution (equation 2.5). This is a function of p. and K, where ju is also a function of the parameter estimates, a The iterative process is aimed at maximizing r  the log-likelihood function with respect to the parameter estimates, a for selected values of K. p  The iterative process continues until the maximum value of K has been reached.  The mean % method consists of fitting the Pearson % value to the number of degrees of freedom. 2  2  As a first iteration, K is solved from the Pearson % equation, and the initial estimates are 2  calculated by using the Poisson distribution. Having an initial value of K , the new parameters are estimated. Then the process is repeated until convergence. .  The mean deviance method is similar to the mean % with the main difference being that the 2  scaled deviance is forced to equal the number of degrees of freedom.  The method of moments proposed by Kulmala (1995) and Maycock and Hall (1984) consists of estimating a first value of K , based on the following equation:  tz^i ic*  ^  (4.1)  t(error?-E(A)i) i=\  where the predicted values E(A); are initially estimated based on the Poisson distribution model. Then, the K value is the run to a G L I M macro to estimate the parameters under the negative binomial distribution. As the previous methods of moments, the process is repeated until  53  Chapter IV: Statistical Considerations  convergence. According to Kulmala (1995) the estimates obtained in this method deviate less than 5% from those produced by the maximum likelihood method.  The previous four methods were used to estimate the parameter K. The results obtained in the GLIM sessions are shown in Appendix IV. Table 4.2 summarizes the results obtained for each method. The table shows that the parameters' values are equal up to the first two decimal points for all methods, and the t-ratios show that for all cases the variables are significant. This shows a relative similarity between the four methods.  PARAMETER  a„ a, a 2  K  Scaled Deviance Deg. of Freedom Pearson %  ^(95%) Error 2  MAXIMUM LIKELIHOOD Value t-ratio 3.2 1.4929 7.8 0.3839 0.7044 12.4 1.97 399 424 459  MEAN x  2  Value t-ratio 3.1 1.4963 0.3827 7.5 0.7058 12.0 1.76 370 424 424  MEAN DEVIANCE Value 1.4905 0.3850 0.7023 2.15 424 424 489  t-ratio 3.3 8.0 12.8  MOMENTS (KULMALA) t-ratio Value 3.4 1.4947 0.3832 8.0 0.7054 12.8 1.85 382 424 439  472  472  472  472  12996  13006  12981  13003  able 4.2 Results of Different Negative Binomial Methods in the Total Model  With Regard to the parameter K, there are more differences than the model's parameters. The highest K value is obtained through the method of mean deviance, followed by the method of maximum likelihood. According to the criteria of maximizing K, which reduces the variance, the best method would be the mean deviance, while the worst would be the mean yj method. However, by analyzing the Pearson % statistic, the method with the highest K is not significant at 2  the 95% confidence level. Therefore, the best method would be the maximum likelihood, which has the second highest K parameter.  54  Chapter IV: Statistical Considerations  Table 4.2 also shows that, the scaled deviance value for all methods is significant compared with the degrees of freedom. The Pearson x statistic is significant at 95% of confidence level for all 2  models, except the mean deviance model, and the sum of error squared show that the estimates are quite similar, which is a result of the similarity in the model's parameters.  Figure 4.1 shows the predicted accidents of the total model as a function of major road traffic volume for the different methods. This Figure shows all curves turning into one curve, which confirms the similarities, found in Table 4.1.  This analysis shows that with exception of the mean deviance method, which was not significant according to the Pearson x statistic, the other three methods yield approximately the same 2  results. However, out of the three significant methods, the maximum likelihood yields the highest K value, and for this reason it is regarded as the most appropriate method.  55  Chapter IV: Statistical Considerations  0  5  10  15  20  25  30  Major Road A A D T (thousands) >— Maximum Likelihood -a— Mean Chi-Square - A — Mean Deviance -*— Moments (Kulmala)  Figure 4.1 Predicted Accidents using Different Methods to Obtain K  4.3 Conclusion This chapter was intended to demonstrate the advantages of the methodology used in chapter three to derive the accident prediction models. First, it was demonstrated in a comparative fashion that the accident prediction models for urban unsignalized intersections follow the negative binomial distribution rather than the Poisson distribution.  Next, it was also demonstrated that the maximum likelihood method is the most appropriate to calculate the negative binomial model's parameter because it yields the maximum value of K for significant models. However, it was found that the methods of the mean % and the method of 2  moments proposed by Kulmala, yielded significant results which were similar to the maximum likelihood method.  56  CHAPTER V APPLICATIONS  5.0 Introduction As described earlier, there are several applications of accident prediction models. This chapter describes five different applications. The first four applications relate to the use of the Empirical Bayes refinement: identification of accident-prone locations, developing critical accident frequency curves, ranking the identified accident-prone locations and before and after safety evaluation. The fifth application provides a safety-planning example, comparing the safety performance of a 4-leg intersection and two staggered T-intersections for the same traffic volume.  Empirical Bayes refinement applications are demonstrated using the model relating the total number of accidents to traffic flows (Table 3.4) because it is the most general model. The Vancouver Island and the Lower Mainland models are also used in the identification and ranking of accident-prone locations.  5.1 Empirical Bayes Refinement As mentioned in Section 2.5, the main goal of using the Empirical Bayes refinement is to yield more accurate, location-specific safety estimate by combining the observed number of accidents at the location, with the predicted number of accidents obtained from the G L I M model.  To illustrate this process, assume that an unsignalized intersection has the following data:  57  Chapter V: Applications  Major road A D T = 15,000 veh/day Minor road A D T = 2,000 veh/day Observed accidents = 11 acc/3 years  Using the model from Table 3.4, the safety of this intersection is:  pred = 1.4929 x  ' 15,000" v  1,000  0.3839  ^2,000  A  0.7044  6.88 acc I'3 years  a,ooo.  y  Using equations 2.17 and 2.18 the empirical safety estimate and its variance respectively, can be calculated as:  EB,safety estimate  1.97 U.97 + 6.88 f  Var(EB y y estimate) sa  et  x6.88 +  6.88  ^  ^6.88 + 1.97/  6.88 1.97 + 6.887  6.88  2  xl.97 +  x 11 = 10.08 acc /3 years  \2  6.88 +1.97^  x 11 = 7.84 (acc/3 years)"  In this example the expected number of accidents is reduced from 11 to 10.08 which corresponds to about eight percent regression to the mean correction.  Figure 5.1 illustrates the Empirical Bayes refinement estimation versus the values predicted from the G L I M model. Notice that the EB estimates are much closer to the 45° line, indicating an  58  Chapter V: Applications  average regression to the mean correction of 35% although for some extreme cases, the corresponding correction is up to 150%.  59  Chapter V: Applications  0  5  10 15 20 25 Observed Accidents (acc/ 3 yrs)  30  35  Figure 5.1 Predicted vs. EB Refined Number of Accidents for Total Model  60  Chapter V: Applications  5.2 Identification of Accident Prone Locations Accident prone locations (APLs) are defined as the locations that exhibit a significant number of accidents compared to a specific norm. Because of the randomness inherent in accident occurrence, statistical techniques that account for this randomness should be used when identifying APLs. The E B refinement method can be used to identify APLs according to the following process (Belanger, 1994):  1.  Estimate the predicted number of accidents and its variance for the intersection, using the appropriate G L I M model. This follows a gamma distribution (the prior distribution) with parameters a, and /?„ where:  =  2.  E(A)  =  _ K _  ^  a  ()  E A  =  (5.1)  K  Determine the appropriate point of comparison based on the mean and variance values obtained in step (1). Usually the 50 percentile (P ) is used as a point of comparison. P th  50  is calculated such that:  0  ™  61  50  Chapter V: Applications  3.  Calculate the E B safety estimate and its variance from equations (2.17) and (2.18) respectively. This is also a gamma distribution (posterior distribution) with parameters a  2  and p : 2  /J?  EB  K  =  :  h 1 and aj = E(A)  Var(EB)  1  1  H  • EB = K + count  (5.3)  l  Then, the probability density function of the posterior distribution is:  r  J EB  ,*  W=  ( l ( K  E  A )  lfc+count)f comt-\ -{ IE{A) \)X  +  +  e  K  +  (5 -4)  r=—  1(K + count)  4.  Identify the location as accident-prone i f there is a significant probability that the intersection's safety estimate exceeds the P  50  value. Thus, the location is identified as  accident prone if: ^  50  1— I *  {KI E(A) + \ J.  )  \  K+COUNT  ~^e~^  K+count  A  K  r(K + count)  ^ E(A)+\)X  dA, >5  (5.5)  where ^represents the confidence level desired (usually 0.95)  For the example given in the previous section, the predicted number of accidents and its variance is 6.88 acc/3yr and 24.67 (acc/3yr) respectively. Then using equation 5.2 to obtain the P value: 2  50  62  Chapter V: Applications  (1.97/6.88)  f  P  0  1 9 7  -i  9  7  -  1  . -d-97/6.88)A e  Al-97)  aA  =  0.5  solving the integral for 0.5, the P value is 5.75 acc/3yr. 50  From the pervious section, the EB estimate and its variance is 10.08 acc/3yr and 7.84 (acc/3yr)  2  respectively. Using equation 5.5 the left-hand side of the equation is:  5  1-  f  (1.97 / 6.88 + if  {  '-  9 7 + 1 1 }  A' 1  - e^  9 7 + 1 X X  9 7 1 6 8 8 + 1  /ATlL. 9 7 + 11)  ^  A  ,  <3/t = 0.96 n  n  c  This indicates that there is a significant probability (96%) of exceeding the P  50  value and the  intersection can be considered accident-prone. Figure 5.2 shows a graphical representation of this example:  63  Chapter V: Applications  0.16  Accidents/3 years  Figure 5.2 Identification of Accident Prone Locations  5.3 Critical Accident Frequency Curves The process of identifying accident-prone locations, as described in the previous section, involves considerable computational effort. To facilitate this process, critical accident frequency curves can be developed for each G L I M model. A critical curve is one that indicates the number of observed accidents that must be exceeded in order to classify the location as accident-prone for a given G L I M model and a confidence level.  The procedure to obtain these critical curves is iterative and makes use of equations (5.2) and (5.5). The initial data is the number of predicted accidents based on a G L I M model with its K parameter. For every predicted accident, the P value is calculated by using equation (5.2). This 50  64  Chapter V: Applications  value is used in equation (5.5), where for a given level of confidence, the equation is solved in an iterative fashion, in order to find the observed number of accidents (variable count in equation (5.5)) that fits the given level of confidence. The critical curve is obtained by joining all the critical points in a Predicted versus Observed Accidents chart.  As an example, Figures 5.3, 5.4 and 5.5 show these curves for the total model (Table 3.4), Vancouver Island and Lower Mainland models (Table 3.8). Three curves are shown in each figure, representing the 90%, 95%, and 99% confidence levels. To illustrate the use of these curves, consider the example described in Section 5.1. Using the total model and the given traffic volumes, 6.88 accidents/3 years, are estimated. For this number of accidents and for 99% confidence level, at least 13 accidents/3 years need to be observed to consider this intersection as accident-prone (Figure 5.3). Table 5.1 shows the number of APLs identified by the three models for different significance levels.  MODEL Total Model Vancouver Island Model Lower Mainland Model  L E V E L OF CONFIDENCE 90% 95% 99% 82 51 67 21 38 30 21 14 6  Table 5.1 Number of Accident Prone Locations  65  Chapter V: Applicatu  50  0  5  10  15  20  25  30  35  40  Predicted Accidents (acc/ 3 years) o Surrey o Victoria A Coquitlam • Vancouver • Burnaby • Nanaimo Figure 5.3 Critical Curves for Total Model  66  Chapter V: Applications  o Victoria • Nanaimo  Figure 5.4 Critical Curves for Vancouver Island Model  67  Chapter V: Applications  o  0  5  10  15  20  25  Predicted Accidents (acc/ 3 years) o Surrey * Coquitlam • Vancouver • Burnaby Figure 5.5 Critical Curves for Lower Mainland Model  68  30  Chapter V: Applications  A n extension of the critical curves can be also applied for different K values. Figure 5.6 shows the critical curves for eight different values of K and a confidence level of 95%. The advantage of this kind of curves is that they can be used for any negative binomial model. The data required to use this curve is a negative binomial model from which the predicted number of accidents is calculated and according to the model's K value, the critical number of accidents is estimated by using the curve for the corresponding K . The disadvantage of this method is that the results are not as accurate as the previous ones, because in most cases there is not a curve for the specific K value (i.e. K=2.17) and the critical value is estimated by approximating the K value to the closest curve.  69  Chapter V: Applications  0  5  10  15  20  25  30  35  40  45  Predicted Accidents Figure 5.6 Critical Curves for Different Values of K  70  50  Chapter V: Applications  Note that the higher the K value, the higher the critical number of accidents. The rationale for this is illustrated in Figure 5.7.  Figure 5.7-a shows the same example as in Section 5.1, but in this case it is assumed to have a K value of 1.0 (low K) and an observed number of accidents of 9.05 acc/3 years (this is the critical number of accidents at 95% of confidence level). Under these conditions the predicted number of accidents is the same 6.88 acc/3 years, but due to the change in the K value and the observed number of accidents, the P  50  is 4.77 acc/3 years and the EB estimate is 8.77 acc/3 years. The  probability of having accidents greater than P value is 95%, a critical condition. 50  Figure 5.7-a shows that at low values of K , the prior distribution is skewed left, and the E B estimate is close to the observed number of accidents. The reason of this is that low K values increase the variance leading to more uncertainty about the predicted value. Therefore, the E B estimate is closer to the observed value rather than the predicted one.  Figure 5.7-b, shows the same model but the K value is considerably higher (K=20). The observed number of accidents is the same as in Figure 5.7-a, but in this case due to the increase in K , this value is no longer critical. The E B estimate is now closer to the predicted number of accidents instead of the observed one, because the variance has decreased leading to more reliability about the G L I M model estimate. The prior distribution is less skewed and closer to the posterior distribution.  71  Chapter V: Applications  In order to find the critical number of accidents for the conditions in this case (Figure 5.7-b), it is necessary to raise considerably the observed number of accidents. Figure 5.7-c shows that the critical value is 15.65 accidents/3 years, which represents an increase of 6.5 accidents/3 years compared with the previous conditions, while the EB estimate has also increased but only by 1.6 acc/3 years. This latter value remains closer to the predicted number of accidents.  72  Chapter V: Applications  a) K=1.0 and Observed Accidents=Critical  Accidents/3 years  b) K=20 and Observed Accidents=Critical for K=1 0.30 -i  -  1  Accidents/3 years  0.30 -,  c) K=20 and Observed Accidents=Critical 1  Accidents/3 years  ure 5.7 Comparison of Critical Accidents for Different K Values  73  Chapter V: Applications  5.4 Ranking of Accident Prone Locations The methods used to identify accident-prone locations explained in the previous two sections can be also useful in ranking these locations. Two ranking criteria can be used. The first is to calculate the ratio between the E B estimate and the predicted frequency (as obtained from the G L I M model) for the accident prone locations identified in the previous section. This ratio represents the deviation of the intersection from the "norm". The higher this ratio the more accident prone the intersection is. The justification for using this ranking criterion is to ensure that the safety level at each criterion is comparable to other intersections with similar characteristics.  Another criterion is to calculate the difference between the E B estimate and the predicted frequency for the accident prone locations. This difference is a good indication of the expected safety benefits and is useful for carrying out the estimation of the pre-implementation safety benefits of countermeasures. Unlike the previous criterion, this one is useful to quantify economical benefits.  A comparison of the two ranking criteria is shown in Table 5.2 for the Vancouver Island Model. Twenty-one accident prone locations (APLs) were identified at the 99% confidence level. The table shows the values of both the difference (EB - Predicted) and ratio (EB/Predicted) for all APLs. As shown in Table 5.2, the difference in rank between the two criteria ranges between 1 and 15 with an average value of 4.9. The difference in rank seems to be higher for the top ranked intersections. The reason for this difference can be explained by the different goals of the two criteria. The first criterion favors intersections with high accident frequency which are usually  74  Chapter V: Applications  more cost-effective to treat. The second criterion considers the deviation from the expected values and its variance regardless of the number of accidents observed. This criterion can be considered by road authorities to ensure that the safety of different locations is within acceptable levels.  Int. No.  Intersection  1 Blanshard-Topaz 2 Cook-Kiwanis 3 Douglas-Tolmie 4 Finlayson-Nanaimo 5 Government-Discovery 6 Vancouver-Balmoral 7 Douglas-Princess 8 Cook-View 9 Southgate-Vancouver 10 Bowen-Pine-Access 11 Dallas-Douglas 12 Douglas-Discovery 13 Albert-Fourth-Pine-Park 14 Quadra-Topaz 15 Wakesiah-Fourth 16 17 18 19 20 21  Fairfield-Foulbay Douglas-Spruce Shelbourne-Pearl Quadra-Burdett Hillside-Graham Quadra-Pembroke  Observed Frequenc y 24 25 24 18 18 15 15 15 14 16 14 13 13 13 13 12 12 11 11 11 11  Predicted Accidents 7.3 9.9 11.1 4.3 3.4 4.1 3.9 3.7 4.7 8.6 2.4 3.9 3.7 3.6 4.8 4.1 5.4 3.4 3.7 2.9 4.6  EB EB-Pred EB/Pred Rank Rank EB-Pred EB/Pred Refined 19.2 21.6 21.3 12.4 11.3 10.5 10.3 10.0 10.4 14.1 7.6 9.1 8.9 8.7 9.9 8.7 9.7 7.5 7.8 6.9 8.6  11.9 11.7 10.2 8.2 7.9 6.4 6.4 6.3 5.7 5.5 5.2 5.2 5.2 5.2 5.1 4.6 4.3 4.1 4.1 4.0 3.9  2.6 2.2 1.9 2.9 3.3 2.5 2.6 2.7 2.2 1.6 3.2 2.3 2.4 2.5 2.1 2.1 1.8 2.2 2.1 2.4 1.8  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  Diff Rank  5 14 18 3 1 7 6 4 12 21 2 11 9 8 17  4 12 15 1 4 1 1 4 3 11 9 1 4 6 2  15 20 13 16 10  1  19  3 5 3 10 2  Table 5.2 Ranking of APLs for The Vancouver Island Model Figure 5.8 shows the values of the two ranking criteria for the top 10 APLs. The figure shows that intersections 1, 3, 4, 5, 6, and 7 are among the top 10 intersections for both methods with an average ranking difference of 5.6. Intersections 2, 8, 9, and 10 are included in the top 10 using the "difference" ranking, but are not included in the "ratio" ranking. The degree of proneness of these intersections, despite of showing high expected benefits, is not among the top 10  75  Chapter V: Applications  intersections. The same applies to intersections 12, 14, 18 and 21 which show a high degree of proneness, but its indication of expected benefits is not among the top 10 intersections.  co "O  c co CD o c CD  ITJ  CD  •o «8 m LU  Ratio Ranking  Difference Ranking  Number inside bar denotes intersection number according to Table 5.2  Figure 5.8 Ranking of Top10 APL for Island Model  There are other ranking criteria relating both a ratio and a difference. These other methods involves parameters of accident prediction models such as the predicted vs. observed number of accidents, observed vs. critical curve value, observed vs. EB estimates, etc. These criteria can be implemented by using the same methodology of this section. There is little research concerning the ranking criteria when using accident prediction models. This is an area that surely needs further research.  76  Chapter V: Applications  5.5 Before and After Studies The effect of a safety measure is often studied by comparing the number of accidents observed after the implementation of the measure, to the expected number of accidents had the measure not been implemented. In simple before and after studies, the observed number of accidents in the period before the implementation is used to estimate the latter value. However, because of the random variations in accident occurrence (e.g. the regression to the mean effect), the observed number of accidents before the implementation may not be a good estimate of what would have happened had no measure been implemented. A n alternative and more accurate approach is to use the E B refinement process.  Using the same example as before, assume that a specific safety measure to reduce the number of accidents at the intersections was implemented. The observed number of accidents in the next three years following the implementation is 8. Therefore, the effectiveness of the measure can be calculated as:  Measure of Effectiveness = 1  g  10.08  = 0.21  which indicates a reduction by 21% in total accidents because of the treatment.  The importance of using accident prediction models in before and after studies is highlighted by the difficulty associated in developing this analysis in a traditional fashion, via a reference group of comparison. This group should be of sufficient size and homogeneity to carry out an accurate analysis. The difficulty lies in defining a group with these features. Accident prediction models  77  Chapter V: Applications  overcome this difficulty, since they represent of local conditions and replace the role of the traditional reference group.  5.6 Safety Comparison of Staggered T and 4-leg Intersections Several researches have compared the safety performance of 4-leg intersections and staggered Tintersections. Kulmala (1995) found that, in general, the staggering of 4-leg intersections into two staggered T-intersections reduces the number of injury accidents if the percentage of traffic entering the junction form the minor road is greater than 5% of the total traffic. He also found that if 50% of the total traffic enter the junction from the minor road, the staggering would reduce the number of injury accidents by 23%. Kulmala (1995) found his results consistent by comparing them with some Nordic studies, where the staggering was found to decrease the number of injury accidents by 0% to 20%.  In order to confirm these results, a safety comparison of 4-leg and staggered T-intersections was carried out using the models developed in Table 3.5 (the separate T and 4-leg intersection models). According to the analysis made in Section 3.2.2, it is also valid to use the total model with intersection type (Table 3.6) which yields approximately the same results. The following assumptions were made:  1.  The traffic volumes on the major and minor roads for the 4-leg intersections are V , and V , respectively (expressed in AADT). 2  78  Chapter V: Applications  2.  For the two staggered T-intersections, the traffic volume on the major road is V „ while the minor approaches have traffic volumes of V /2 (expressed in A A D T ) . This 2  assumption ensures that the traffic volume in both scenarios is the same.  3.  The two staggered intersections will not affect each other (isolated intersections). This assumption depends, of course, on the distance between the two intersections.  The results of the comparison are shown in Figure 5.9 for three different minor road traffic volumes. The results indicate that the staggering is effective in reducing the predicted number of accidents. This reduction increases as the traffic volume on the major or minor road increases. It should be noted that the degree of reduction would vary with different ratios of traffic volumes on the major and minor roads.  79  Chapter V: Applications  a) Minor Road AADT=500 veh/day  J L II  10  15  20  30  25  35  40  Major R o a d A A D T (thousands)  b) Minor Road AADT=2,000 veh/day  J L II  10  15  20  30  25  35  40  Major R o a d A A D T (thousands)  c) Minor Road AADT=10,000 veh/day  45 -,  yrs)  40 35 -  o  30 -  a  25 -  u U O  20 -  Si  15 -  e  10 -  < u •5  5 0 10  15  20  25  30  35  40  45  50  Major R o a d A A D T (thousands)  Figure 5.9 Staggered T vs 4-leg Intersections Safety Comparison 80  Chapter V: Applications  5.7 Conclusion This chapter has shown five applications of accident prediction models. Most of these applications make use of the E B refinement methods, in order to reduce the regression to the mean phenomenon.  It has been shown that accident prediction models are useful in identifying accident prone locations (APLs) with a probabilistic confidence level by using both analytical and graphical methods. It is also possible to rank the APLs by two different criteria, difference and ratio, according to the particular objectives of the road's authorities.  In addition, accident prediction models can be used for evaluating the safety of a countermeasure, without having to define a reference group, because the G L I M models contains the characteristics of the location.  Finally, it was found that staggered T-intersections are safer than 4-leg intersections, a finding that should be taken into account by road planning authorities. These results agree with those found in the literature.  81  CHAPTER VI CONCLUSIONS AND RECOMMENDATIONS  6.1 Conclusions The main objective of this project is to develop accident prediction models for estimating the safety potential of urban unsignalized (T and 4-leg) intersections in the Greater Vancouver Regional District (GVRD) and Vancouver Island on the basis of their traffic characteristics. The models are developed using the generalized linear regression modeling (GLIM) approach, which addresses and overcomes the shortcomings associated with the conventional linear regression approach. The safety predictions obtained from G L I M models can be refined using the Empirical Bayes' approach to provide, more accurate, site-specific safety estimates. The use of the complementary Empirical Bayes approach can significantly reduce the regression to the mean bias that is inherent in observed accident counts.  This study made use of sample accident and traffic volume data corresponding to unsignalized (both T and 4-leg) intersections located in urban areas of the Greater Vancouver Regional District (GVRD) and Vancouver Island. The data included a total of 427 intersections located in the cities of Victoria, Surrey, Nanaimo, Coquitlam, Burnaby and Vancouver. The information available for each intersection included the total number of accidents in the 1993-1995 period, traffic volumes for both major and minor roads given in Average Annual Daily Traffic (AADT) and type of intersection (T or 4-leg).  82  Chapter VI: Conclusions and Recommendations  Four categories of models were developed in this study: (1) models for the total number of accidents; (2) separate models for T and 4-leg intersections; (3) separate models for different regions (Vancouver Island, the Lower Mainland and Surrey); and (4) a model for Surrey including intersection control. Table 6.1 summarizes the models' results.  Models developed in this thesis used the negative binomial distribution approach, which has the advantage of explaining the dispersion characteristic of the observed data compared with the Poisson distribution. In addition, different tests showed that the maximum likelihood method yields the most appropriate parameters under the negative binomial distribution assumption.  Model Form  t-ratio  Model for the total number of accidents  (AADT A  03m  mair  Acc / 3 yrs-1.4929 x\ \  m a j r d  1000  (AADT - . f \ mmrd\ \ 1000 J  \ )  7  0  4  4  AAU1  x  A  /->  nn-,-,1  • A 0  ma]rd\  AAU1  Acc 13 yrs = 0.9333 x  V  4-leg intersection model  /  4 5 3 1  [  -— 1000 J  T  \  m m r a  1000  x  J  (AADT - A*™ ^rd V 1000 J A A U 1  A A D T  ^^  06m  minn  J  Vancouver Island model (AADT  V  m  n  -0.3 5.5 7.4  164 (183)  2.34  205 (214)  a„ a, a  3.6 6.8 9.1  230 (238)  2.17  251 (274)  a a, a, b,  -2.8 8.7 11.7 6.3  394 (423)  2.23  449 (471)  A A  i l 9 x T y p e  a a, a  302 (347)  2.92  383 (390)  2  2.6 6.0 9.0  a a, a.  7.6 2.4 3.7  81 (74)  6.27  76 (94)  0  . A  1000  0  3  0  4  2  *  J  ( AAV,T  \0.5488  n>inrd] 1000 J  AADT  V  Lower Mainland model . „ » ( Acc/3 yrs = 6.5929 x \  a a, a  0  AccHyrs = 0 . 5 7 7 6 x { » > « J " i t ™ JAADT A \ 1000 ) I 1000  Accllyrs = 1.3807x  1.97  2  Total Model with Intersection Type  a  DT  ) — IOOO ) m a j r d  0 2 m  x v  t  (AADT  m  m  IOOO  nM  minrd  m  ;  * Denotes significance at a 95-percent confidence level  Table 6.1 Summary of Accident Prediction Models  83  2  399 (424)  2  J 65  maird m a j r d  1000  \0.5806  AADT n  V  (AADT \ \  j  minrd  x  Mm  Accllyrs = 1.6947x  J  Pearson x (X test)* 459 (472)  3.2 7.8 12.4  a  (AADT  K  a, a 2  T-intersection model  SD (dof)  2  Chapter VI: Conclusions and Recommendations  Five applications of accident prediction models were used in this thesis. Four of them related to the use of the Empirical Bayes refinement: identification of accident-prone locations, developing critical accident frequency curves, ranking the identified accident-prone locations and before and after safety evaluation. The fifth application provided a safety-planning example, comparing a 4leg intersection to two staggered T-intersections.  It was shown that accident prediction models are useful in identifying accident prone locations (APLs) with a probabilistic confidence level by using both analytical and graphical methods. It is also possible to rank the APLs by two different criteria, difference and ratio, according to the particular objectives of the road's authorities.  In addition, accident prediction models can be used to evaluate the safety benefits of a countermeasure, without having to define a reference group, because the G L I M models contains the characteristics of the location.  Finally, it was found that staggered T-intersections are safer than 4-leg intersections, a finding that should be taken into account by road planning authorities. These results agree with previous researches made in the Scandinavian countries.  6.2 Recommendations for further research This thesis has developed accident prediction models for urban unsignalized intersections that included independent variables such as traffic volumes and control type. It is recommended that these models be further refined by adding more variables such as:  84  Chapter VI: Conclusions and Recommendations  Intersection control type: A n attempt to develop this model was made in this thesis, but the results showed a poor fit. Therefore it is recommended to use a larger sample size to obtain a significant model, in order to assess the safety effect of intersection control type in a similar fashion that this thesis assessed the safety effect of T and 4-leg intersections.  Intersection Layout variables: Accident occurrence can be explained by several variables. Including intersection layout variables (e.g. number of lanes of each road, number of left and right turn lanes, pedestrian crosswalks, speed limit, etc) should enhance our understanding of the relationships between accident occurrence and geometric design.  Accident Type: In safety evaluation of countermeasures it may be necessary to look at individual accident types (e.g. rear-end, right angle, etc.) as opposed to the total number of accidents. Therefore, it is recommended that models for specific accident types be developed.  Finally, as explained earlier, there is a need for more research on ranking accident prone locations. This is very important in situations when the road authority has resources to address only a limited number of accident prone locations, it is important to focus on those with the highest potential of accident reduction or those which deviates from the normal safety levels for similar locations.  85  BIBLIOGRAPHY  Belanger, C. (1994). "Estimation of safety of four-leg unsignalized intersections", Transportation Research Record, 1467, Transportation Research Board, National Research Council, Washington D. C , pp. 23-29.  Bonneson, J. A . and McCoy, P. T., (1993). "Estimation of safety at two-way stop-controlled intersections on rural highways", Transportation Research Record, 1401, Transportation Research Board, National Research Council, Washington D. C , pp. 83-89.  Bonneson, J. A . and McCoy, P. T., (1997). "Effect of median treatment on urban arterial safety: A n accident prediction model", Transportation Research Record, 1581, Transportation Research Board, National Research Council, Washington D. C , pp. 27-36.  Briide, U. and Larsson, J. (1988). "The use of prediction models for eliminating effects due to regression-to-the mean in road accident data", Accident Analysis and Prevention, V o l 20, No 4, pp. 299-310.  Famoye, F. (1997). "Parameter estimation for generalized negative binomial distribution", Communications-in-Statistics. Part B: Simulation and Computation, Vol. 26, No 1, pp 269-279.  86  Bibliography  Feng, S. and Sayed, T. (1997). "Accident prediction models for signalized intersections". The University of British Columbia, Department of Civil Engineering. Report prepared for the Insurance Corporation of British Columbia.  Hauer, E., Ng J. C. N. and Lovell J., (1988). "Estimation of safety at signalized intersections", Transportation Research Record, 1185, Transportation Research Board, National Research Council, Washington D. C , pp. 48-61.  Hauer, E. (1992). "Empirical Bayes approach to the estimation of 'unsafely': The multivariate regression method", Accident Analysis and Prevention, Vol 24, No 5, pp. 457-477.  Jovanis, P. P. and Chang H. L. (1986). "Modeling the relationship of accidents to miles traveled", Transportation Research Record, 1068, Transportation Research Board, National Research Council, Washington D. C , pp. 42-51.  Kulmala, R. and Roine, M., (1988). "Accident prediction models for two-lane roads in Finland", Conference on traffic safety theory and research methods proceedings, April, Session 4: Statistical analysis and models. Amsterdam: SWOV, pp. 89-103.  Kulmala, R., (1995). "Safety at rural three- and four-arm junctions. Development of accident prediction models", Espoo 1995, Technical Research Centre of Finland, V T T 233.  87  Bibliography  Lawless, J., (1987). "Negative binomial and mixed Poisson regression", The Canadian Journal of Statistics, Vol. 15 No 3, pp. 209-225.  McCullagh P. and Nelder J.A., (1983) "Generalized Linear Models", Chapman and Hall, New York.  Maher, M . J. and Summersgill, I., (1996). " A comprehensive methodology for the fitting of predictive accidents models", Accident Analysis and Prevention, Vol 28, No 3, pp. 281-296.  Maycock, G. and Hall, R. D. (1984). "Accidents at 4-arm roundabouts", Transport and Road Research Laboratory. T R R L Laboratory Report 1120.  Miau, S. and Lum, H., (1993). "Modeling vehicle accident and highway geometric design relationships", Accidents Analysis and Prevention, Vol. 25, No 6, pp. 689-709.  Mountain, L. and Fawaz, B., (1996). "Estimating accidents at junctions using routinely-available input data", Traffic Engineering and Control, Vol. 37, No 11, pp. 624-628.  Numerical Algorithms Group (NAG), (1994), "The G L I M system. Release 4 manual", The Royal Statistical Society.  Numerical Algorithms Group (NAG), (1996), " G L I M 4. Macro Library Manual, Release 2", The Royal Statistical Society.  88  Bibliography  Saccomanno, F. F. and Buyco, C. (1988). "Generalized loglinear models of truck accident rates". Paper presented at Transportation Research Board 67 annual meeting. Washington, D. C. th  Satterthwaite, S. P., (1981). " A survey of research into relationships between traffic accidents and traffic volumes", Transport and Road Research Laboratory. T R R L Supplementary Report 692.  Stevens J., (1986). "Applied multivariate statistics for the social sciences" Lawrence Erlbaum Associates, Inc., Publishers, Hillsdale, N..J.  89  APPENDIX I RESULTS OF OUTLIERS IDENTIFICATION  0.10 • Denotes High Cook's Distance  0.09 0.08 03 0  0.07  CD  0.06  _g  >0 O c  &  0.05  w 0.04 Ik o o O 0.03 0.02  4-  0.01  ~o~o Q  0.00  °  5  10  15  cr •  • ° •  ° n i "  25  20  30  35  Observed Accidents (acc/3 years)  Figure A M Identification of the Highest Cook's Distance Values for Total Model with Intersection Type  Rank Cook's Intersection Number Distance  Sample Size  Scaled Dev.  SD Drop  Cumulative SD Drop  x  2  1  157  426  390.69  3.63  3.63  3.84  2  45  425  389.49  1.20  4.83  5.99 7.81  3  51  424  388.36  1.13  5.96  4  14  423  387.19  1.17  7.13  9.49  5  44  422  385.66  1.53  8.66  11.07  6  22  421  383.85  1.81  10.47  12.59  7  5  420  382.25  1.60  12.07  14.07  8  224  419  380.45  1.80  13.87  15.51  Table Al- Identification of Outliers for Total Model with Intersection Type  90  Appendix I: Results of Outliers Identification  0.30 A Denotes High Cook's Distance  0.27 0.24 0.21 0.18 0.15 0.12 0.09 0.06 0.03 •  0.00 10  -915  20  25  30  35  Observed Accidents (acc/3 years)  Figure AI-2 Identification of the Highest Cook's Distance Values for T-lntersection Model  1 2  Cumulative SD Drop  x  3.00  3.00  3.84  1.07  4.07  5.99 7.81  Sample Size  Scaled Dev.  SD Drop  66  185  160.61  9  184  159.54  Rank Cook's Intersection Distance Number  2  3  14  183  158.15  1.39  5.46  4  53  182  157.21  0.94  6.40  9.49  5  8  181  156.03  1.18  7.58  11.07  6  103  180  154.62  1.41  8.99  12.59  Ta ble AI-2 Identification of Outliers 1or T-lntersection Model  91  Appendix I: Results of Outliers Identification  0.07 A Denotes High Cook's Distance 0.06 w 0.05 a) 3  0.04 0.03 0.02 •  •  B  0.01 _B  °  •  •  •  •  •  • • • • • •  n  •  •  0.00 5  10  15  25  20  30  35  Observed Accidents (acc/3 years)  Figure AI-3 Identification of the Highest Cook's Distance Values for 4-leg Intersection Model  Rank Cook's Intersection Number Distance  Sample Size  Scaled Dev.  SD Drop  Cumulative SD Drop  x  2  1  12  240  229.12  1.18  1.18  3.84  2  20  239  227.18  1.94  3.12  5.99 7.81  3  3  238  225.32  1.86  4.98  4  29  237  223.38  1.94  6.92  9.49  5  223  236  221.94  1.44  8.36  11.07  6  159  235  219.83  2.11  10.47  12.59  7  133  234  218.25  1.58  12.05  14.07  8  231  233  217.27  0.98  13.03  15.51  Table AI-3 Identification of Outliers for 4-Leg Intersection Model  92  Appendix I: Results of Outliers Identification  0.09 A Denotes High Cook's Distance 0.08 0.07 J  0.06  co  >  8 0.05  c 8  •  -A-  g 0.04 o 0.03 o O •  0.02 0.01  -o-  |E  B• • 1 BT 10 BR  -  0.00  —I—  15  20  25  30  35  Observed Accidents (acc/3 years)  Figure AI-4 Identification of the Highest Cook's Distance Values for Vancouver Island Model  Rank Cook's Intersection Number Distance  Sample Size  Scaled Dev.  SD Drop  Cumulative SD Drop  x  1  1  101  349  298.88  2.97  2.97  3.84  2  41  348  298.01  0.87  3.84  5.99  3  73  347  297.44  0.57  4.41  7.81  4  200  346  295.41  2.03  6.44  9.49  5  326  345  294.14  1.27  7.71  11.07  6  222  344  291.91  2.23  9.94  12.59  7  89  343  290.04  1.87  11.81  14.07  Table AI-4 Identification of Outliers for Vancouver Island Model  93  Appendix I: Results of Outliers Identification  0.09 A Denotes High Cook's Distance 0.08 0.07 <D _2 CD >  O  0.06 0.05  C  CD -*—• W  £ 0.04  o 0.03 o O  • a  a  0.02  P  -o  -cr  •  4Q  • ° H °  0.01  5 D  D  „ •  •  D  • •  Q  •  • •  - Q — • q • •-•—9 10 15 20  0.00  —  -r-O-  25  30  35  Observed Accidents (acc/3 years)  Figure AI-5 Identification of the Highest Cook's Distance Values for Lower Mainland Model  Rank Cook's Intersection Number Distance  Sample Size  Scaled Dev.  SD Drop  Cumulative SD Drop  x  3.84  2  1  14  76  80.08  0.95  0.95  2  5  75  78.81  1.27  2.22  5.99  3  22  74  77.47  1.34  3.56  7.81  4  74  73  73.99  3.48  7.04  9.49  5  66  72  70.52  3.47  10.51  11.07  6  10  71  69.49  1.03  11.54  12.59  70  67.94  1.55  13.09  14.07  7  57  Table AI-5 Identification of Outliers for Lower Mainland Model  94  Appendix I: Results of Outliers Identification  0.16 A Denotes High Cook's Distance 0.14 0.12 0.10 0.08 0.06 0.04  • •  0.02  ou I • •  0.00  —r~  5  10  • •  15  • •  • o  20  c  30  25  35  Observed Accidents (acc/3 years)  Figure AI-6 Identification of the Highest Cook's Distance Values for Surrey Total Model  Sample Size  Scaled Dev.  SD Drop  Cumulative SD Drop  x  14  55  54.60  1.12  1.12  3.84  2  5  54  53.14  1.45  2.58  5.99  3  4  53  52.26  0.88  3.45  7.81  Rank Cook's Intersection Number Distance 1  Table AI-6 Identification of Outliers for Surrey Total Model  95  2  Appendix I: Results of Outliers Identification  0.14 A Denotes High Cook's Distance 0.12  A  0.10 0.08  A  A  0.06  •  A A A  0.04  •  Q  • •  0.02  • • • • • •  0.00 .  •  • 3  [  • j-Q-a-° 10-B-B o_15 a-V  O  0  m  n  0  °  °  a  ,  20  25  ,  30  35  Observed Accidents (acc/3 years)  Figure AI-7 Identification of the Highest Cook's Distance Values for Surrey Total Model with Intersection Control Type  Rank Cook's Intersection Number Distance  Sample Size  Scaled Dev.  SD Drop  Cumulative SD Drop  x  1  1  14  55  54.40  1.08  1.08  3.84  2  51  54  53.40  1.00  2.08  5.99 7.81  3  45  53  52.40  1.00  3.08  4  10  52  51.49  0.90  3.98  9.49  5  5  51  50.15  1.35  5.33  11.07  6  49  50  49.35  0.80  6.13  12.59  7  4  49  48.40  0.95  7.08  14.07  Table AI-7 Identification of Outliers for Surrey Total Model with Intersection Control Type  96  APPENDIX II PREDICTION RATIO VS. ACCIDENT FREQUENCY  o c o  •8 -3 o Surrey o Victoria * Coquitlam • Vancouver  -5 -  • Burnaby • Nanaimo 10  15 20 25 Predicted Accidents (acc/ 3 yrs)  30  35  Figure AIM AR vs. Accident Frequency for Total Model  1 0.5 0  0  9  o -0.5  0  1 or  "I  .1  -1.5  T3 £0  Q_  Ho o  o  U f a . u ° o  o  i  O  0  °  "  v  0  0  /  o o  ° ° o  0  0  0  O  " • (J  o  v  0  o  o  o  •  o -2 -2.5 -3 -3.5  o  •A 4  6 8 10 Predicted Accidents (acc/ 3 yrs)  12  14  16  Figure AII-2 AR vs. Accident Frequency for T-intersection Model  97  Appendix II: Prediction Ratio vs. Accident Frequency  o  o  '•E3  c o  ° »° o o  ° o  • o  °.o° Sffo *o o  ••8  0  =5 -2 8> Q_  0  -3  Surrey  o Victoria  * Coquitlam • Vancouver  •  Burnaby  * Nanaimo 10  15 20 25 Predicted Accidents (acc/ 3 yrs)  30  35  40  Figure All-3 AR vs. Accident Frequency for 4-leg intersection Model  » ° % o -o  I <*>o  8. -3 -4 -  o Surrey 0 Victoria  -5  1 Coquitlam • Vancouver  -6  • Burnaby • Nanaimo 10  15 20 25 Predicted Accidents (acc/ 3 yrs)  30  35  40  Figure All-4 AR vs. Accident Frequency for Total Model Including Intersection Type  98  Appendix II: Prediction Ratio vs. Accident Frequency  -A -5  o Victoria  -I  • Nanaimo -6  0  4  6 8 10 Predicted Accidents (acc/ 3 yrs)  14  12  16  Figure AII-5 AR vs. Accident Frequency for Vancouver Island Model  o Surrey * Coquitlam • Vancouver • Burnaby  10 15 20 Predicted Accidents (acc/ 3 yrs)  25  Figure All-6 AR vs. Accident Frequency for Lower Mainland Model  99  Appendix II: Prediction Ratio vs. Accident Frequency  2  10 15 Predicted Accidents (acc/ 3 yrs)  Figure AII-7 AR vs. Accident Frequency for Surrey Total Model  10 15 Predicted Accidents (acc/ 3 yrs)  Figure AII-8 AR vs. Accident Frequency for Surrey Model with Control Type  100  APPENDIX III GLIM SESSION FOR ESTIMATING APM  [o] [o] [o] [i] [i] [i] [i] [i] [i] [i] [i] [o] [o] [o] [o] [o] [o] [o] [o] [o] [i] [ ] e  [ ] e  [e] [h] [i] [i] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [i] [i]  GLIM 4, update 8 f o r IBM e t c . 80386 PC / DOS on 28-Oct-1997 a t 09:33:59 (copyright) 1992 Royal S t a t i s t i c a l S o c i e t y , London ? $C ACCIDENT PREDICTION MODELS FOR UNSIGNALIZED INTERSECTIONS$ ? $C TOTAL MODEL$ ? $Units 427$ ? $Data VI V2 T o t a l T o t a l _ 3 y r Type$ ? $Dinput ' u n s i g a l l . t x t ' $ ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LV1+LV2 $D E$ s c a l e d d e v i a n c e = 1576.8 a t c y c l e 4 r e s i d u a l df = 424 estimate 1 0.3943 2 0.4067 3 0.6086 s c a l e parameter 1.000  s.e. 0.07586 0.02814 0.02636  parameter 1 LV1 LV2  ? $Input % p l c 80 NEGBIN.MAC$ i*********************************************************** i*********************************************************  $echo o f f $ ? $Number theta=0$ ? $Use negbin t h e t a $D E$ s c a l e d deviance = 398.83 (change = r e s i d u a l d f = 424 (change = ML E s t i m a t e o f THETA = Std Error = (  -1178.) a t c y c l e 3 0 )  1.966 0.1828)  NOTE: s t a n d a r d e r r o r s of f i x e d e f f e c t s do not take account o f the e s t i m a t i o n of THETA 2 x Log-likelihood 2 x Full Log-likelihood estimate 1 0.4007 2 0.3839 3 0.7044 s c a l e parameter 1.000  = =  s.e. 0.1233 0.04940 0.05676  4697. on 424 df -2138. parameter 1 LV1 LV2  ? $C END OF TOTAL MODEL$ ? $C TOTAL MODEL INCLUDING INTERSECTION TYPE$  101  Appendix III: GLIM Session for Estimating APM  ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ -- model changed ? $ F i t LVl+LV2+Type $D E$ s c a l e d d e v i a n c e = 1435.8 a t c y c l e < r e s i d u a l df = 423 o] estimate o] 1 -0.5266 o] 2 0.4336 o] .3 0.5937 4 0.5268 o] o] s c a l e parameter 1.000  s. e .0.1114 0 . 02801 0 . 02721 0 . 04564  parameter 1 LV1 LV2 TYPE  .0]  i ] ? $Number theta=0$ i] ? $Use negbin t h e t a $D E$ .0] s c a l e d d e v i a n c e = 394 3 2 (change 0] r e s i d u a l d f = 423 (change .OJ  .0] .0]  -1042. 0  at c y c l e 2  ML E s t i m a t e o f THETA = 2 .228 Std E r r o r = ( 0. 2170) NOTE: s t a n d a r d e r r o r s o f f i x e d e f f e c t s do not take account of the e s t i m a t i o n o f THETA  .0] .0]  2 x Log- l i k e l i h o o d 2 x Full Log-likelihood  .0J  estimate 1 -0. 5488 2 0.4221 0] 0] 3 0 . 6480 '0] 4 0 . 5379 .0] s c a l e parameter 1.000 .0] .0]  =  =  s. e. 0 . 1920 0 . 04843 0 . 05534 0 . 08504  4734. on 423 -2100. parameter 1 LV1 LV2 TYPE  ? $C END OF TOTAL MODEL INCLUDING INTERSECTION TYPE$ ? $C MODEL FOR T INTERSECTIONS$ ? $ U n i t s 186$ ? $Data VI V2 T o t a l T o t a l _ 3 y r $ ? $Dinput ' u n s i g t . t x t ' $ ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LV1+LV2 $D E$ s c a l e d d e v i a n c e = 485.10 a t c y c l e 4 r e s i d u a l d f = 183 estimate 1 -0.3980 2 0.5809 3 0.5902 s c a l e parameter 1.000  s.e. 0.1675 0.05957 0.04250  parameter 1 LV1 LV2  ? $Input % p l c 80 NEGBIN.MAC$ 1*********************************************************** 1*********************************************************  $echo o f f $  102  Appendix III: GLIM Session for Estimating APM  h] i] i] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o]  ? $Number theta=0$ ? $Use negbin t h e t a $D E$ s c a l e d deviance = 163.61 (change = r e s i d u a l d f = 183 (change = ML E s t i m a t e o f THETA = Std Error = (  -321.5) a t c y c l e 4 0 )  2.345 0.3825)  NOTE: s t a n d a r d e r r o r s o f f i x e d e f f e c t s do not take account o f the e s t i m a t i o n of THETA 2 x Log-likelihood 2 x Full Log-likelihood estimate 1 -0.06907 2 0.4531 3 0.5856 s c a l e parameter 1.000  = =  995.9 on 183 d f -805.6  s.e. 0.2149 0.08263 0.07892  parameter 1 LV1 LV2  ;i] ? $C END OF T INTERSECTION MODEL$ i ] ? $C FOUR LEGGED INTERSECTION MODEL$ ;i] ;i] ;i] ;i] i] ;i] o] ^o] o] o] o] ;o] o] 'o] io] l] g] g]  >] :h] ii] ii] o] io] .o] [o] !o] 'o] o] o] 'o] ;o]  ? $ U n i t s 241$ ? $Data VI V2 T o t a l T o t a l _ 3 y r $ ? $Dinput 'unsig4.txt'$ ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LV1+LV2 $D E$ s c a l e d deviance = 942.2 9 a t c y c l e 4 r e s i d u a l d f = 238 estimate 1 0.6422 2 0.3884 3 0.5944 s c a l e parameter 1.000 ? $Input  s.e. 0.08496 0.03194 0.03557  parameter 1 LV1 LV2  % p l c 80 NEGBIN.MAC$  i*********************************************************** i*******************************************************  $echo o f f $ ? $Number theta=0$ ? $Use negbin t h e t a $D E$ s c a l e d deviance = 230.30 (change = r e s i d u a l d f = 23 8 (change = ML E s t i m a t e o f THETA = Std Error = (  -712.0) a t c y c l e 3 0 )  2.172 0.2647)  NOTE: s t a n d a r d e r r o r s o f f i x e d e f f e c t s do not ' take account o f the e s t i m a t i o n of THETA 2 x Log-likelihood  =  3740. on 238 d f  103  Appendix III: GLIM Session for Estimating APM  o] 2 x Full Log-likelihood = -1293. o] o] estimate s.e. parameter o] 1 0.5275 0.1481 1 o] 2 0.4099 0.06025 LV1 o] 3 0.7065 0.07740 LV2 o] s c a l e parameter 1.000 o]  i] i]  ? $C END OF FOUR LEGGED INTERSECTION MODEL$ ? $C ISLAND MODEL$  i] i] i] i] i] i] o] o] o] o] o] o] o] o] o] i]  ? $Units 350$ ? $Data VI V2 T o t a l T o t a l _ 3 y r $ ? $Dinput i s l a n d . t x t $ ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LV1+LV2 $D E$ s c a l e d deviance = 734.32 a t c y c l e 4 r e s i d u a l d f = 347  gi  1  1  estimate 1 0.2872 2 0.3231 3 0.5240 s c a l e parameter 1.000 ? $Input  s.e. 0.09310 0.03632 0.03783  parameter 1 LV1 LV2  % p l c 80 NEGBIN.MAC$  i**************************************************  gi  i * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  e] h] i] i] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o] o]  $echo o f f $ ? $Number theta=0$ ? $Use negbin t h e t a $D E$ s c a l e d deviance = 301.85 (change = r e s i d u a l d f = 347 (change = ML E s t i m a t e of THETA = Std Error = (  -432.5) a t c y c l e 2 0 )  2.920 0.3861)  NOTE: s t a n d a r d e r r o r s of f i x e d e f f e c t s do not take account o f the e s t i m a t i o n o f THETA 2 x Log-likelihood 2 x Full Log-likelihood estimate 1 0.3226 2 0.3042 3 0.5488 s c a l e parameter 1.000  = =  s.e. 0.1224 0.05025 0.06123  i] i]  ? $C END OF ISLAND MODEL$ ? $C MAINLAND MODEL$  i] i] i]  ? $ U n i t s 77$ ? $Data VI V2 T o t a l T o t a l _ 3 y r $ ? $Dinput 'mainland.txt'$  985.9 on 347 df -1473. parameter 1 LV1 LV2  104  Appendix III: GLIM Session for Estimating APM  i] i] i] o] o] o] o] o] o] o] o] o] i] g]  ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LV1+LV2 $D E$ s c a l e d deviance = 249.82 a t c y c l e 4 r e s i d u a l df = 74 estimate 1 1.912 2 0.2036 3 0.2474 s c a l e parameter 1.000  s.e. 0.1410 0.04688 0.04259  parameter 1 LV1 LV2  ? $Input % p l c 80 NEGBIN.MAC$ r****************************************  e] $echo o f f $ h] i ] ? $Number theta=0$ i ] ? $Use negbin t h e t a $D E$ 'o] s c a l e d d e v i a n c e = 81.024 (change = -168.8) a t c y c l e 3 o] r e s i d u a l d f = 74 (change = 0 ) o] o] ML E s t i m a t e o f THETA = 6.265 o] Std Error = ( 1.486) o] o] NOTE: s t a n d a r d e r r o r s o f f i x e d e f f e c t s do not o] take account o f the e s t i m a t i o n of THETA o] o] 2 x Log-likelihood = 3870. on 74 d f o] 2 x Full Log-likelihood = -505.9 o] o] estimate s.e. parameter o] 1 1.886 0.2478 1 o] 2 0.2011 0.08382 LV1 o] 3 0.2864 0.07817 LV2 o] s c a l e parameter 1.000 o]  i] i]  ? $C END OF MAINLAND MODEL$ ? $C TOTAL MODEL FOR SURREY$  i] ;i] ii] i] i] i] o] o] o] 'o] o] o] o] o] 'o] i]  ? $Units 56$ ? $Da VI V2 T o t a l _ 3 y r T C o n t r o l $ ? $Dinput 'unsry.txt'$ ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LV1+LV2 $D E$ s c a l e d deviance = 151.51 a t c y c l e 3 r e s i d u a l df = 53 estimate 1 2.148 2 0.1529 3 0.1720 s c a l e parameter 1.000  s.e. 0.1532 0.05082 0.05030  parameter 1 LV1 LV2  ? $Input % p l c 80 NEGBIN.MAC$  105  Appendix III: GLIM Session for Estimating APM  i********************************************************  $echo o f f $ ? $Numer theta=0$ ? $Use negbin t h e t a $D E$ s c a l e d deviance = 55.716 (change = r e s i d u a l d f = 53 (change = ML E s t i m a t e of THETA = Std E r r o r = (  -95.80) a t c y c l e 3 0 )  8.893 2.639)  NOTE: s t a n d a r d e r r o r s of f i x e d e f f e c t s do not take account o f the e s t i m a t i o n of THETA 2 x Log-likelihood 2 x Full Log-likelihood estimate 2 .133 0.1516 0.1907 s c a l e parameter 1.000  3007. on 53 df -360 . 5 parameter 1 LV1 LV2  s.e. 0 .2439 0.08195 0.08343  ? $C END OF SURREY TOTAL MODEL$ ? $C MODEL FOR SURREY INTERSECTION INCLUDING TYPE OF CONTROL$ ? $Units 56$ ? $Da VI V2 T o t a l _ 3 y r T C o n t r o l $ ? $Dinput 'unsry.txt'$ ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LVl+LV2+TControl $D E$ s c a l e d deviance = 149.55 a t c y c l e 3 r e s i d u a l df = 52 estimate 1 2.191 2 0.1647 3 0.1958 4 -0.05703 s c a l e parameter 1.000 ? $Input  s.e. 0 .1571 0 . 05173 0 . 05310 0 . 04071  parameter 1 LV1 LV2 TCONTROL  % p l c 8 0 NEGBIN.MAC$  i***********************************************************  j************************************************************ $echo o f f $ ? $Number theta=0$ ? $Use negbin t h e t a $D E$ s c a l e d deviance = 55.476 (change = r e s i d u a l d f = 52 (change = ML E s t i m a t e o f THETA = Std E r r o r = (  -94.07) a t c y c l e 3 0 )  9.096 2 . 714)  106  Appendix III: GLIM Session for Estimating APM  [o] NOTE: s t a n d a r d e r r o r s of f i x e d e f f e c t s do not [o] take account o f the e s t i m a t i o n o f THETA [o] [o] 2 x Log-likelihood = 3008. on 52 df [o] 2 x Full Log-likelihood = -359.4 [o] [o] estimate s.e. parameter [o] 1 2.185 0.2491 1 [o] 2 0.1645 0.08265 LV1 [o] 3 0.2256 0.08752 LV2 [o] 4 -0.06994 0.06764 TCONTROL [o] s c a l e parameter 1.000 [o] [i] ? $C END OP SURREY MODEL$ [i] ? $Stop  107  APPENDIX IV GLIM SESSION FOR DIFFERENT NEGATIVE BINOMIAL METHODS  [o] GLIM 4, update 8 f o r IBM e t c . 80386 PC / DOS on 09-NOV-1997 a t 01:14:42 [o] (copyright) 1992 Royal S t a t i s t i c a l S o c i e t y , London [o]  [i] [i]  ? $!TOTAL MODEL ESTIMATION OF NEGATIVE BINOMIAL DISTRIBUTION PARAMETERS! ? $!METHOD OF MAXIMUM LIKELIHOOD (NAG, 1996)!  [i] [i] [i] [i] [i] [i] [o] [o] [o] [o] [o] [o] [o] [o] [o] [i]  ? $ u n i t s 427$ ? $data v l v2 t o t a l t o t a l _ 3 y r type$ ? $dinput ' u n s i g a l l . t x t ' $ ? $ c a l c l v l = % l o g ( v l ) : Lv2=%log(V2)$ ? $Yvar t o t a l _ 3 y r $ e r r o r P $ l i n k L$ ? $ F i t LV1+LV2 $D E$ s c a l e d deviance = 1576.8 at c y c l e 4 r e s i d u a l df = 424  [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e]  ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !  fe]  estimate 0.3943 0.4067 0.6086 s c a l e parameter 1.000 1 2 3  s.e. 0.07586 0.02814 0.02636  parameter 1 LV1 LV2  ? $input % p l c 80 NEGBIN.MAC$  |****************************************************** Author:  John Hinde, MSOR Department, U n i v e r s i t y of E x e t e r jph@msor.ex.ac.uk V e r s i o n : 1.1 GLIM4 February 1996 Main Macros: NEGBIN F i t s a n e g a t i v e b i n o m i a l d i s t r i b u t i o n f o r o v e r d i s p e r s e d count data. For d e t a i l s on the n e g a t i v e b i n o m i a l d i s t r i b u t i o n see Lawless (1987) Canadian J . of S t a t s , 15, 209-225. The o v e r d i s p e r s i o n parameter t h e t a can be f i x e d o r estimated, u s i n g an i n n e r loop embedded w i t h i n the model f i t t i n g p r o c e s s . I f the s p e c i f i e d parameter v a l u e i s zero, e s t i m a t i o n i s performed u s i n g e i t h e r maximum l i k e l i h o o d ( d e f a u l t ) , the expected v a l u e of the c h i - s q u a r e d s t a t i s t i c as i n Breslow, N.E. (1984) A p p l i e d S t a t i s t i c s 33, p38-44, or the mean deviance. P r i o r t o u s i n g t h i s macro the f o l l o w i n g model a s p e c t s need t o be d e c l a r e d : y-variate:  use  $YVAR  model formulae:  t h i s w i l l be taken from the l a s t f i t d i r e c t i v e , o r can be e x p l i c i t l y s e t u s i n g  108  <yvariate>  Appendix IV: GLIM Session for Different Negative Binomial Methods  [e] ! $TERMS <model formula> [e] ! [e] ! l i n k f u n c t i o n : s e t u s i n g $LINK [e] ! p e r m i s s i b l e v a l u e s i , 1, s [e] ! [e] ! Formal arguments: [e] ! theta (obligatory) s c a l a r f o r negative binomial [e] ! parameter e s t i m a t e [e] ! i f theta=0 e s t i m a t i o n i s performed [e] ! i f t h e t a / = 0 used as f i x e d v a l u e i n n e g a t i v e [e] ! binomial f i t [e] ! method ( o p t i o n a l ) S c a l a r c o n t r o l l i n g e s t i m a t i o n method when [e] ! appropriate [e] ! 1 = maximum l i k e l i h o o d ( d e f a u l t i f theta=0) [e] ! 2 = mean c h i - s q u a r e e s t i m a t i o n [e] ! 3 = mean deviance e s t i m a t i o n [e] ! 4 = use f i x e d v a l u e o f t h e t a ( d e f a u l t i f t h e t a / = 0 ) [e] ! tol (optional) Scalar s p e c i f i e s tolerance c r i t e r i o n to [e] ! c o n t r o l convergence o f i t e r a t i o n on t h e t a . [e] ! Defaults to 0.0001. [e] ! I f tol<=0 then convergence c r i t e r i o n i s s e t t o % c c , [e] ! the system convergence c r i t e r i o n , [e] ! [e] ! Output: [e] ! D i s p l a y s t h e n e g a t i v e b i n o m i a l deviance, t h e degrees o f freedom [e] ! f o r the f i t t e d r e g r e s s i o n model, the e s t i m a t e o f t h e t a , i t s [e] ! s t a n d a r d e r r o r when u s i n g maximum l i k e l i h o o d e s t i m a t i o n , [e] ! and v a l u e s o f the l o g - l i k e l i h o o d . The d e v i a n c e p r o v i d e s a [e] ! g o o d n e s s - o f - f i t measure f o r a n e g a t i v e b i n o m i a l [e] ! d i s t r i b u t i o n w i t h the c u r r e n t v a l u e of t h e t a . [e] ! When t h e t a i s f i x e d deviance d i f f e r e n c e s can be used t o [e] ! a s s e s s the importance o f model terms. [e] ! To compare models w i t h d i f f e r e n t v a l u e s o f t h e t a t h e [e] ! l o g - l i k e l i h o o d must be used. [e] ! In p a r t i c u l a r , t h i s a p p l i e s f o r comparisons w i t h [e] ! the s t a n d a r d P o i s s o n model ( t h e t a = i n f i n i t y ) [e] ! The l o g - l i k e l i h o o d s a r e those f o r t h e n e g a t i v e b i n o m i a l [e] ! d i s t r i b u t i o n , the f u l l v e r s i o n i n c l u d i n g t h e y! terms. [e] ! [e] ! Side E f f e c t s : [e] ! On e x i t from the macro the model i s s t i l l d e f i n e d w i t h [e] ! a n e g a t i v e b i n o m i a l v a r i a n c e f u n c t i o n . Submodels can then [e] ! be f i t t e d d i r e c t l y w i t h $FIT d i r e c t i v e s . T h i s w i l l work [e] ! f i n e f o l l o w i n g a f i x e d parameter f i t , but s h o u l d be [e] ! used w i t h c a u t i o n i f t h e t a was e s t i m a t e d - use o f $RECYCLE [e] ! c o u l d h e l p t h i n g s i n t h i s case. [e] ! [e] ! Example o f use: [e] ! $yvar y $ l i n k 1 $terms 11$ [e] ! $number theta=0 $ [e] ! $use negbin theta$ [e] ! [e] ! NB_OUT Can be used a f t e r subsequent $FIT d i r e c t i v e s t o o b t a i n [e] ! output g i v e n by NEGBIN, i . e . the e s t i m a t e of t h e t a , i t s  109  Appendix IV: GLIM Session for Different Negative Binomial Methods  [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [e] [g]  ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !  s t a n d a r d e r r o r f o r maximum l i k e l i h o o d f i t s l o g - l i k e l i h o o d values, Formal  and t h e  arguments: theta (obligatory) s c a l a r f o r negative binomial parameter e s t i m a t e  Example o f u s e : $yvar y $ l i n k 1 $terms 11$ $number theta=0 $ $use negbin theta$ $recy $ f i t -11$ $use nb_out$  To d e l e t e macros and g l o b a l v a r i a b l e s , $ d e l e t e #d_negbin d_negbin $  type  i**************************************************  [e] $echo o f f $ [f] ** i d e n t i f i e r expected but not found, a t [80 NEGBIN.] [f ] [h] The $INP d i r e c t i v e expected an i d e n t i f i e r but found the c h a r a c t e r nstead. [h] Check t h e syntax of the d i r e c t i v e , [h] [i] ? $number theta=0$ [i] ? $number mode=l$ [i] ? $use negbin t h e t a mode $D E$ [w] -- model changed [w] -- model changed [o] s c a l e d d e v i a n c e = 398.83 (change = -1178.) a t c y c l e 3 [o] r e s i d u a l d f = 424 (change = 0 ) [o] [o] ML E s t i m a t e o f THETA = 1.966 [o] S t d E r r o r = ( 0.1828) [o] [o] NOTE: s t a n d a r d e r r o r s of f i x e d e f f e c t s do not [o] take account o f the e s t i m a t i o n o f THETA [o] [o] 2 x Log-likelihood = 4697. on 424 d f [o] 2 x Full Log-likelihood = -2138. [o] [o] estimate s.e. parameter [o] 1 0.4007 0.1233 1 [o] 2 0.3839 0.04940 LV1 [o] 3 0.7044 0.05676 LV2 [o] s c a l e parameter 1.000 [o]  [i]  ? $!METHOD OF MEAN CHI-SQUARE (NAG, 1996)!  [i] [i] [w] [w] [o]  ? $number theta=0 : mode=2$ ? $use negbin t h e t a mode $D E$ model changed -- model changed s c a l e d deviance = 370.24 (change  =  110  -1207.) a t c y c l e 3  Appendix IV: GLIM Session for Different Negative Binomial Methods  [o] r e s i d u a l df = 424 (change = 0 ) [o] [o] Mean Chi-squared e s t i m a t e of THETA = 1.764 [o] [o] NOTE: s t a n d a r d e r r o r s o f f i x e d e f f e c t s do not [o] take account o f the e s t i m a t i o n of THETA [o] [o] 2 x Log-likelihood = 4696. on 424 df [o] 2 x Full Log-likelihood = -2139. [o] [o] estimate s.e. parameter [o] 1 0.4030 0.1269 1 [o] 2 0.3827 0.05104 LV1 [o] 3 0.7058 0.05905 LV2 [o] s c a l e parameter 1.000 [o]  [i]  ? $!METHOD OF MEAN DEVIANCE (NAG, 1996)!  [i] [i] [w] [w] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o] [o]  ? $number theta=0 : mode=3$ ? $use negbin t h e t a mode $D E$ -- model changed -- model changed s c a l e d deviance = 424.00 (change = r e s i d u a l df = 424 (change =  [i] [i] [i] [i] [i] [i] [o] [o] [o] [o] [o] [o] [o]  ? $Units 427$ ? $Da VI V2 T o t a l T o t a l _ 3 y r Type$ ? $Dinput ' u n s i g a l l . t x t ' $ ? $Calc L V l = % l o g ( V l ) : LV2=%log(V2)$ ? $Yvar T o t a l _ 3 y r $ E r r o r P $Link L$ ? $ F i t LV1+LV2 $D E$ s c a l e d deviance = 1576.8 a t c y c l e 4 r e s i d u a l df = 424  Mean Deviance  -1153.) a t c y c l e 2 0 )  e s t i m a t e o f THETA =  2.154  NOTE: s t a n d a r d e r r o r s of f i x e d e f f e c t s do not take account o f the e s t i m a t i o n o f THETA 2 x Log-likelihood 2 x Full Log-likelihood estimate 1 0.3991 2 0.3850 3 0.7023 s c a l e parameter 1.000  = =  s.e. 0.1201 0.04803 0.05490  4696. on 424 df -2139. parameter 1 LV1 LV2  [i] ? $!TOTAL MODEL ESTIMATION OF NEGATIVE BINOMIAL DISTRIBUTION PARAMETERS! [i] ? $!FOLLOWING THE METHOD OF MOMENTS PROPOSED BY KULMALA(1995) AND MAYCOCK! [i] ? $! AND HALL (1984)!  1 2 3  estimate 0.3943 0.4067 0.6086  s.e. 0.07586 0.02814 0.02636  parameter 1 LV1 LV2  111  Appendix IV: GLIM Session for Different Negative Binomial Methods  [o] s c a l e parameter 1.000 o] ii] ? $Number k=1.733$ !i] ? $MACRO NEGBIN! ii] $MAC? $Calc %va=%fv+(%fv**2)/k$ !i] $MAC? $Calc % d i = 2 * ( % y v * % l o g ( % y v / % f v ) - ( % y v + k ) * % l o g ( ( % y v + k ) / ( % f v + k ) ) ) $ !i] $MAC? $ENDMAC$ !i] ? $ E d i t 74 T o t a l _ 3 y r 0.0001 : 118 T o t a l _ 3 y r 0.0001 : 228 T o t a l _ 3 y r 0001$ V] -- change t o d a t a v a l u e s a f f e c t s model ;i] ? $ E d i t 238 T o t a l _ 3 y r 0.0001 : 375 T o t a l _ 3 y r 0.0001$ !i] ? $Yvar T o t a l _ 3 y r $ E r r o r Own NEGBIN $Link L$ !i] ? $ F i t LV1+LV2 $D E$ ,o] deviance = 365.66 a t c y c l e 5 [o] r e s i d u a l d f = 424 o] ,o] estimate s.e. parameter o] 1 0.4033 0.1184 1 o] 2 0.3825 0.04765 LV1 O] 3 0.7061 0.05520 LV2 o] s c a l e parameter 0.8624 o] i ] ? $Number k=1.8474$ ,i] ? $Yvar T o t a l _ 3 y r $ E r r o r Own NEGBIN $Link L$ !w] -- model changed i ] ? $ F i t LV1+LV2 $D E$ o] deviance = 382.23 a t c y c l e 5 o] r e s i d u a l df = 424 o] o] estimate s.e. parameter o] 1 0.4019 0.1190 1 o] 2 0.3832 0.04779 LV1 O] 3 0.7054 0.05513 LV2 o] s c a l e parameter 0.9015 'o] ;i] ? $Number k=1.8472$ ;i] ? $Yvar T o t a l _ 3 y r $ E r r o r Own NEGBIN $Link L$ w] -- model changed !i] ? $ F i t LV1+LV2 $D E$ o] deviance = 382.20 a t c y c l e 5 [o] r e s i d u a l d f = 424 'o] ,o] estimate s.e. parameter o] 1 0.4019 0.1190 1 !o] 2 0.3832 0.04779 LV1 o] 3 0.7054 0.05513 LV2 .o] s c a l e parameter 0.9014 o] i] ? $Stop  112  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Country Views Downloads
United States 11 2
China 11 8
United Kingdom 3 0
Unknown 2 0
Norway 2 0
Ethiopia 2 0
India 2 0
France 2 0
Canada 2 0
Belgium 1 0
Bangladesh 1 0
Russia 1 0
Nigeria 1 0
City Views Downloads
Unknown 12 71
Ashburn 7 0
Beijing 6 0
Shenzhen 5 8
Mountain View 3 2
Roorkee 2 0
Saint Petersburg 1 0
Calgary 1 0
Vancouver 1 0
Arlington Heights 1 0
Dhaka 1 0
Ghent 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0050154/manifest

Comment

Related Items