Negative Binomial Regression Model for Road Crash Severity Prediction

In this paper, the Negative Binominal Regression (NBR) technique was used to develop crash severity prediction model in Jordan. The primary crash data needed were obtained from Jordan Traffic Institute for the year 2014. The collected data included number and severity of crashes. The data were organized into eight crash contributing factors including: age, age and gender, drivers’ faults, environmental factors, crash time, roadway defects and vehicle defects. First of all, descriptive analysis of the crash contributing factors was done to identify and quantify factors affecting crash severity, then the NBR technique using R-statistic software was used for the development of the crash prediction model that linked crash severities to the identified factors. The NBR model results indicated that severe crashes decreased significantly as the age of both male and female drivers increased. They significantly decreased as the environmental conditions improved. In addition, sever crashes were significantly higher during weekdays than weekends and in the morning than in the evening. The results also indicated that sever crashes significantly increased as drivers have faults while driving. In addition, mirror and brake deficits were found to be the only factors among all possible vehicle deficits factors that contributed significantly to severe crashes. Finally, it was found that the results of the NBR model are in agreement with the descriptive analysis of the crash contributing factors.


Introduction
Motor vehicle travel is the primary mode of transportation in the world.It provides high degree of mobility.But the resulting road traffic crashes are considered the leading cause of death, with disproportionate number occurring in developing countries (WHO, 2014;Murray & Lopez, 1996).Jordan, one of the developing countries, suffers from this problem.Jordan traffic institute statistics reported that 144,521 traffic crashes occurred during the year 2016, as a result 750 people lost their lives, and 17,435 people have been injured.The death rate in the same year was 2.1 fatalities per day, 4.99 fatalities per 10,000 vehicles, 7.65 fatalities per 100,000 population.The injury rate, for the same year, was 47.8 injuries per day, 116 injury per 10,000 vehicles, and 177.9 injury per 100,000 population, with an estimated cost of crashes JD 323 million (JTI, 2016).
Table 1 shows number of crashes, fatalities, number of registered vehicles, population and crash cost for the period (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016).It can be seen that during the ten year period, there has been an increase of almost 78.50 percent in vehicle ownership accompanied by 57.64 percent reduction in fatality rate per 10,000 vehicles, and 55.49 percent reduction in fatality rate per 10,000 persons.The latter may be attributed to the larger increase in the number of population over the study period (JTI, 2016).This may occurred as a result of the political situation of the neighboring countries of Jordan, which led to increase the number of arrivals to Jordan.Also, it can be seen that traffic crashes are very costly for a developing country with limited resources such as Jordan.The average cost of traffic crashes in Jordan for the ten year period was estimated to be about JD 277.25 million.This is equivalent to about 0.72 percent of the gross domestic product of Jordan which is about JD 54.44 billion (Global Finance, 2016).The objectives of this research are to identify and quantify crash contributing factors through analyzing the necessary crash data, and to develop Crash Prediction Model (CPM) that link crash severities to the identified factors.

Literature Review
Researchers have constantly sought ways to improve traffic safety and gain better understanding to better predict crash likelihood under different crash contributing factors.Some researchers have specifically investigated crash contributing factors involving the geometric design features and design consistency of roads (Anderson et al., 1999;Montella, 2008).Anastasopoulos et al. (2012) investigated the pavement quality of roads as a crash contributing factor using the tobit regression model to predict accident rates per mile driven instead of frequency per unit time.Vogel and Bester (2005) studied the relationship between crash types and causes.They found that human factors contribute the most to traffic crashes with a percentage of 75.40 % of the total crashes, followed by environmental factors with a percentage of 14.5% and vehicle factors with a percentage of 10.20 %.Also, Spainhour et al. (2005) studied traffic crash fatality causes in Florida, USA.They found that human factors mainly driving under influence was the main factor which accounted for about 94.00 % of the total fatal crashes.Khan and Tehreem (2012) investigated the causes of road traffic crashes in Pakistan.They found that the main causes of traffic crashes were unskilled drivers, poor road conditions, use of cell phone while driving, and over loading.Jadaan et al. (2013) developed road fatality prediction model in Jordan using multiple linear regression.They found that weather, road pavement, road surface and light conditions do not have any major effect on fatal traffic crashes.Naghawi (2017) used the quasi induced exposure method to study young drivers' crashes in Jordan.She found that young drivers' crash risk increased under poor environmental conditions.
Many statistical methods have been used in crash modeling and prediction.Crash prediction models (CPMs) are very useful tools in traffic safety, with their capabilities to determine the relationship between frequency and/or severity of crashes and crash contributing factors.Early CPMs were generally based on linear regression assuming normality of the error term, constant variance for the residuals, and linear relationship between dependent and independent variables (Abbas et al., 2011;Obaidat & Ramadan, 2012).However many researchers illuminated the numerous problems with linear regression models (Miaou & Lum, 1993;Abdel-Aty and Radwan, 2000;Lord & Mannering, 2010) which have led to the adoption of more appropriate regression models such as Poisson Regression Models.Poisson Regression Models are based on a generalized linear regression and assume an exponential relationship between dependent and the independent variables (Eenin at al., 2007).One problem that always restricts the use of Poisson models is that the mean and the variance should be equal for the dependent variable (Winkelmann, 2003).To overcome this problem, the Negative Binomial (NB) or called Poisson gamma model has been investigated as an alternative to Poisson model given that it relaxes the condition of over dispersion (Abdel-Aty and Radwan, 2000).

Data Collection
The primary crash data needed for the development of the CPM were obtained from Jordan Traffic Institute (JTI) for the year 2014.The data included frequency and severity of crashes, also it included data on crash contributing factors.

Determination of Crash Contributing Factors
Determination of crash contributing factors is the most important step in the development of CPM.For the purpose of this study, crashes were categorized into severe and none severe crashes.Severe crashes are crashes that result in a fatality or an injury.Non severe crashes are crashes that don't result in a fatality or an injury.The collected crash data for the year 2014 included 102,441 crashes, 15.10 % of them were severe crashes.The collected crash data were organized into eight crash contributing factors.These factors included: age, age and gender, driver's faults, environmental conditions including road lighting, roadway surface and weather conditions, time including season, day and time of the day, speed, roadway defects and vehicle defects.Each of these crash contributing factors was divided into several crash contributing variables as will be discussed in the following sections.
Two steps were used for the determination of contributing factors that significantly affect crash severity in Jordan.
1.A preliminary descriptive analysis of the crash contributing factors for the year 2014 was done to identify and quantify factors affecting crash severity in Jordan.Then, 2. Negative Binomial Regression technique was used for the development of the CPM that link crash severities to the identified factors.

Step One: Preliminary Analysis
Table 2 through Table 9 show the frequency distribution of severe crashes for each crash contributing factor mentioned earlier.

Age Group
Driver's age is a demographic variable of great importance since it identifies groups of drivers with higher crash tendency.Drivers were grouped into six variables according to drivers age groups including (≤ 20, 21-30, 31-40, 41-50, 51-60  Table 3 shows the analysis of severe crashes based on age and gender.It can be seen that males are more likely to get involved in crashes than females among all age groups.This might be explained by the fact that male drives more and are willing to take more risk than females.Also, the total number of crashes was not documented by the JTI.Environmental conditions include road lighting conditions, road surface conditions and weather conditions.Road lighting conditions were divided into four variables: good lighting conditions for day light, fair lighting conditions for night with sufficient light, poor lighting conditions for night without sufficient light and other light conditions for dark, sunset and sunrise.Road surface conditions were divided into three conditions/variables: good surface condition indicating dry surface, fair surface condition indicating wet, mud and sandy surface and poor surface condition for snow, ice and oily surface.Weather conditions were divided into two variables: good weather conditions when no adverse conditions exist and poor weather conditions including fog, rain, snow, dust and hard wind.Table 5 shows frequency and percentage of crashes, severe crashes and non severe crashes related to environmental conditions.It was found that 77.07 % of the total severe crashes occurred under good lighting conditions, 95.88 % of the total severe crashes occurred under good road surface conditions and 96.52 % of all severe crashes have occurred under good weather conditions.These results indicate that the majority of crashes has occurred under good lighting, road surface and weather conditions.This is expected because good environmental condition is the dominant condition throughout the year in Jordan which is favorable conditions for speeding.

Time
The year was divided into four seasons/variables: Spring (March through May), Summer (Jun through August), Fall (September through November) and Winter (December through February).The week was divided into the conventional seven days and the day was divided into four time periods including: morning peak (6am to 10am), day time (10am to 4pm), evening peak (4pm to 10pm) and night time (10pm to 6am).Speed was divided into eleven variables with 10 km/hr increments starting from 20 km/hr and ending with 120 km/hr.Table 7 shows the frequency and percentage of severe and non severe crashes for each speed limit increment.It can be seen that severe crashes increase with the increase in the speed limit up to 40 km/hr where it reaches the highest number of crashes, stay relatively high, then it declines noticeably for speed limits greater than 60 km/hr.This can be explained by the fact that higher speed limits are associated with high functional classification of roads in which higher design standards are implemented.Vehicle defects were also analyzed as a crash contributing factor.Vehicle defects were divided into nine variables including: worn tires, lighting defect, steering defects, brake's defects, windshield defect, mirrors, direction indicators, mud pads and engine failure.Table 9 shows number of severe crashes under different vehicle defect variable.The table didn't include the total number of crashes since it was not documented by JTI.It was found that brake defects yielded the highest number of severe crashes with a percentage of 61.25 % of the total number of severe crashes then defective mirrors with a percentage of 28.05 % of the total number of severe crashes.

Step Two: Modeling of Crash Data
As mentioned earlier, both Poisson and Negative Binomial Regression methods are the most popular methods that are used to model count data.However, Poisson regression model requires that both the mean and the variance of the aggregated data are equal.When the data variance is larger than its mean, the data are over-dispersed, and in this case the negative binomial model is the suitable model for such data (Washington et al., 2011).To test our data against over-dispersion, a dispersion test suggested by Cameron and Trivedi (1998) was applied.The null hypothesis in this test is that both the mean and the variance for the data are equal (the mean is E(Y) = μ and the variance is Var(Y) = μ as well).This null hypothesis is tested against an alternative hypothesis where the variance is equal to Var(Y) =μ+c * f(μ).In this equation, when the constant c is less than zero, it means under-dispersion and when the constant c is greater than zero, it means over-dispersion, and the function f(μ) is a linear function.This test was applied using "dispersiontest" function in R-statistic software (version 3.4.)package AER.The results of the test showed that the value of c was 5.215 which clearly indicate that the data is over-dispersed; therefore, the Negative Binomial Model will be applied using R-statistic software.
The results from applying this test are shown in Table 11.The model contains intercept and the statistically significant crash contributing variables.It can be seen that 20 variables were found to be statistically significant and impact severe crashes.The positive sign for a parameter indicates that severe crashes will increase as the value for the variables related to that parameter increase.The negative sign for a parameter indicates that severe crashes will decrease as the value for the variable related to that parameter increase.Therefore, the results in Table 11 indicated that severe crashes will decrease as the age of both male and female drivers increase.This can be attributed to the increase in drivers' experience as the driver's age increase.However, the parameter values indicate that severe crashes decrease at a higher rate as age of females increase than that of males.The same relationship is obtained between severity of crashes and both the environment and the time factors.First, severity of crashes decreases as road lighting conditions gets better.This can be attributed to the fact that drivers are more cautious in driving under poor road lighting conditions.The same conclusion can be applied for the relation between severity of crashes and road surface condition, where severity of crashes under good road surface condition is more than that under bad road surface condition.In addition to the environmental factors, the results in Table 11 indicate that severity of crashes will be higher during weekdays than at weekends, and in the morning than in the evening.However, the small value of the parameters for these two variables indicates that these two factors do not have high contribution to the severity of crashes.On the other hand, the results in Table 11 indicates that the number of crashes increase as drivers have faults in driving.Finally, mirror and brake deficits were found to be the only factors among all possible vehicle deficits that contribute to severe crashes.These findings are in agreement with the descriptive analysis of the crash contributing factors except that it was found that speed and roadway defects had no statistical significant effect on CPM.

Summary
The objectives of this research are to identify and quantify factors affecting crash severity in Jordan through analyzing necessary crash data, and to develop a crash prediction model that relates crash severity to the identified factors.The data needed were obtained from Jordan Traffic Institute (JTI) for the year 2014.The data included number and severity of crashes for each crash contributing factor.The collected crash data were organized into nine crash contributing categories.These categories included: driver's edge, driver's edge and gender, driver's fault, crash type, time, speed, environment factors, roadway defects and vehicle defects.Two steps were used for the determination of contributing factors that significantly affect crash severity.First a preliminary descriptive analysis of the crash contributing factors was done to identify and quantify factors affecting crash severity.Then, the Negative Binomial Regression model was developed using R-statistic software.The results indicated that the model contained 20 variables that significantly impact severe crashes in Jordan.Among the general finding, it was found severe crashes decreased significantly as the age of both male and female drivers increased.They significantly decreased as the environmental conditions improved.In addition, sever crashes were significantly higher during weekdays than weekends and in the morning than in the evening.The results also indicated that sever crashes significantly increased as drivers have faults while driving.
In addition, mirror and brake deficits were found to be the only factors among all possible vehicle deficits factors that contributed significantly to severe crashes.Finally, it was found that the results of the Negative Binomial Regression model are in agreement with the descriptive analysis of the crash contributing factors.

Table 1 .
Number of Crashes, Vehicle Ownership, Fatalities, and Fatality Rates in Jordan

Table 3 .
Crashes by Age and Gender

Table 4
illustrates thirteen driver's faults variables that might contribute to severe traffic crashes.Drivers' faults included: tailgating, falling to take necessary precautions, using incorrect lane, priority false, improper backing, failing to yield, speeding, loss of control while driving, improper turn, running red light, driving in the opposite direction and wrong maneuver.The table also shows the total number of crashes, severe crashes and non severe crashes caused by each driver's fault.It was found that failing to take the necessary precautions resulted in the highest number of driver's fault crashes with a percentage of 27.89 % followed by tailgating with a percentage of 21.91 % followed by priority false with a percentage of 13.25 %.Theses faults resulted in 26.62 %, 2 4.92 % and13.92% severe crashes for the three mentioned driver's faults respectively.

Table 5 .
Crashes under Different Environmental Conditions Table6shows the number of crashes, frequency and percentage, for each season, day and time of the day.It can be seen that 28.35 % of the severe crashes occurred during the summer months from June to August.While winter months witnessed the lowest number of severe crashes with a percentage of 21.48 %.Also the lowest number of severe crashes occurred during weekends specifically on Friday with a percentage of 13.33 %.The table also shows that almost 34.55 % of severe crashes occurred during day time then 32.77 % of severe crashes occurred during the evening peak.Finally, it can be concluded that the lowest number of severe crashes happen during morning peak with a percentage of 11.32 %.

Table 7 .
Crashes by Speed LimitRoadway defects were divided into six variables including: defective shoulder area, defective roadway surface, signing obstructions, defective traffic control device, improper horizontal curve design and others.Table8illustrates the analysis of severe crashes for different roadway defects.The table didn't include total number of crashes as it was not documented by JTI.It can be seen that defective roadway surface contributed the most to severe crashes with a percentage of 5.01 % of the total number of severe crashes caused by defective roadway conditions.

Table 8 .
Crashes under Different Roadway conditions

Table 9 .
Crashes under Different Vehicle Defects

Table 11 .
Negative Binomial regression results