The Impact of Sidewalks on Vehicle-Pedestrian Crash Severity

Walking is a sustainable mode of transportation that has several benefits related to improved health and reducing traffic congestion. The drawback to walking as a mode of transportation is the increased potential to be involved in a severe crash, which is greater than when two automobiles are involved in a crash. This paper provides a statistical analysis of pedestrian crashes that occurred in two Alabama cities where the crashes are divided into those where a sidewalk was present and those where a sidewalk was not present. The goal of the paper is to determine the difference in crash experiences and variables that contribute to vehicle-pedestrian crashes associated with the presence of the sidewalk. The paper uses binary logistic regression to develop models of pedestrian crashes and evaluates the models to determine factors that contribute the pedestrian crashes. The paper concludes that pedestrian crashes often happen in the evenings, with low lighting and visibility levels, independent of the presence of sidewalks.


Introduction to the Problem
Walking is a vital and sustainable means of transportation and is gaining popularity.The 2009 National Household Travel Survey (NHTS) presented that an estimated 42 billion walking trips are made every year in the US, accounting for 10.5% of the total trips taken (1).The safety of these pedestrians therefore is a top priority.In 2014, almost 5,000 pedestrian were killed and 65,000 injured in traffic crashes in the US, with 78% occurring in urban areas (2).In Alabama, there were a total of 759 vehicle-pedestrian crashes resulting in 283 fatalities and incapacitating injuries, with another 387 pedestrians injured, with 84% being reported in urban areas (3).
To ease pedestrian movements, sidewalks are usual constructed along roadways to allow for those walking a quality, weather restraint surface to make their trip.Additionally, the presence of a sidewalk provides legitimacy to the walking trip and a perceived level safety upon which the pedestrian might use to justify making their trip.However, even with a sidewalk in place, there is still the possibility of a vehicle-pedestrian crash to occur.
This paper provides a statistical analysis of crashes that occurred in two Alabama cities where the pedestrian crashes are divided into those where a sidewalk was present and those where a sidewalk was not present.The goal of the paper is to determine the difference in crash experiences and variables that contribute to vehicle-pedestrian crashes associated with the presence of the sidewalk.The paper uses binary logistic regression to develop models of pedestrian crashes and evaluates the models to determine factors that contribute the pedestrian crashes.The paper concludes that pedestrian crashes often happen in the evenings, with low lighting and visibility levels, independent of the presence of sidewalks.

Related Literature
The study of vehicle-pedestrian crashes has been examined by several researchers looking at different aspects of the problem.Several statistical methodologies have been used to model pedestrian crashes including mixed logit, logistic regression, ordered probit, and binary logistic regression (4,5,6,7,8,9,10,11).
Studies have concluded that increases in speed lead to more severe crashes while increases in lanes and width of lanes tended to decrease the number of crashes (12,13,14).The urban environment has been studied and determined that land-use and transit availability have an influence on pedestrian crashes, typically negatively as walking friendly development and increased transit tend to have higher instances of pedestrian crashes, however it is often assumed that these higher numbers are actually lower on a comparative rate bases to the exposure of pedestrian and the number of people choosing to walk (15,16,17,18,19,20).Other factors such as traffic operations have been shown to decrease crashes (21,22).
Studies by McMahon and colleagues have concluded the pedestrian crashes tend to be higher in locations where sidewalks do not exist versus locations where sidewalks are present (23,24).Another paper by Retting et al. concluded that sidewalks can reduce the risk of pedestrian crashes in residential areas (25).With regard to residential areas, several studies indicated that traffic calming devices, intended to reduce the speed of vehicles, also can reduce the number of pedestrian crashes because many were caused by children who often do not accurately gauge speed of vehicles and tend to cross mid-block (26,27,28,29).Pedestrian visibility was often cited as an issue in crashes and the installation of lights for nighttime pedestrians was presented as means to improve safety (30,31).
This study performs a binary logistic statistical analysis to test the impact of the presence of sidewalks on pedestrian crash severity, which has not been covered in the related literature on pedestrian crashes.

Methodology
To analyze the differences in crashes between those that occur with a sidewalk present and those that occur without a sidewalk present, a statistical model will be used to analyze the data.The statistical modeling tool used in this study is Binary Logistic Regression using IBM SPSS Statistics 24.

Logistic Regression
The goal of using binary logistic regression is similar to any type of modeling analysis, to find the best fit and the most parsimonious model.The distinguishing characteristic of the logistic regression model from a linear regression model is the response variable.In the logistic regression model, the response variable is binary or dichotomous (32).The difference between logistic and linear regression is reflected both in the choice of a parametric model and in the assumptions.Once this difference is accounted for, the methods employed in an analysis using logistic regression follow the same general principles used in linear regression analysis.

Binary Logistic Regression
Binary logistic Regression estimates the probability that a characteristic is present (e.g.estimate probability of "success") given the values of explanatory variables (32).
The definitions of the variables Y and X are as follows: Let Y, for any subset i, be a binary response variable such that Y i = 1 if the trait is present in observation and Y i = 0 if the trait is not present in observation.
Let X = (X 1 , X 2 , ..., X k ) be a set of explanatory variables which can be discrete, continuous, or a combination.x i is the observed value of the explanatory variables for observation.
For our analysis, the response variable will be Y i = 1 when aa crash with a certain severity is observed and Y i = 0 if the alternate severity is recorded.The explanatory variables X 1 , X 2 , ..., X k will be collected from the crash analysis database as an attempt to define the dependent variable.
Setting up these variables gives us the model (33): (2) Several assumptions must be made for the model to be correct.Firstly, the data set Y i must be independently distributed, that is, the cases are independent of one another.In addition the distribution of Y is Bin(n i , π i ), i.e., binary logistic regression model assumes binomial distribution of the response (32).The dependent variable doesn't need to be normally distributed, but it typically assumes a distribution from an exponential family.The data set does not assume a linear relationship between the dependent and independent variables, but it does assume linear relations between the logit and response variables; logit(π) = β 0 + βX (33).Homogeneity of variance doesn't need to be satisfied.Errors need to be independent but not normally distributed.Due to the fact that maximum likelihood estimation is used rather than ordinary least squares to estimate its parameters, the model relies on large-sample approximations.
To determine the goodness of fit for the model, various statistics must be considered, namely the chi-square, deviance G 2 and likelihood ratio test and statistic, ΔG 2 and the Hosmer-Lemeshow test and statistic.For estimating the parameters, the maximum likelihood estimator (MLE) for (β 0 , β 1 ) is obtained by finding (β^0,β^1)(β^0,β^1) that maximizes (33):

Data Preparation
The data used in this analysis were extracted from the Critical Analysis Reporting Environment (CARE) maintained by the Center for Advance Public Safety at the University of Alabama.Pedestrian crash data was obtained for Huntsville and Montgomery, two cities similar in size, in Alabama.Both cities are around 200,000 in population and sidewalks availability is limited to selected locations throughout the cities, such that there are several areas without and without sidewalks.
To perform the analysis and generate a sufficient amount of data, pedestrian crash data from both cities were aggregated and organized into two datasets, -Sidewalk Present‖ and -No Sidewalk Present‖.There were two levels in the response variable to examine the severity of the pedestrian crash, severe indicating that a fatality or incapacitating injury occurred and not severe indicating a minor injury or no injury occurred.

Sidewalk Present
For data analysis purposes, any incident that occurred within 20 meters of a sidewalk was defined to be a sidewalk present crash.An assumption was made that the presence of a sidewalk implied that the pedestrian was using the sidewalk correctly as there is no mechanism to be certain that the pedestrian was not walking in the roadway near a sidewalk.The total number of crashes in this group from both cities is 149.

No Sidewalk Present
All pedestrian crash data that was not included in the Sidewalk Present group was analyzed as No Sidewalk Present crashes.Total number of crashes in this group from both cities is 120.From an exposure metric, the number of crashes that occurred without the presence of a sidewalk is interesting because these areas would be ones where pedestrian traffic would not typically be expected as there are no sidewalks to encourage walking trips.In addition, areas without sidewalks experience difference in roadway lighting, shoulders availability, edge of pavement maintenance and ditch placement.

Contributing Circumstances
When examining the pedestrian crashes, there are a number of different elements that can contribute the crash, as recorded by the officer completing the crash report.The crash could have been the fault of the driver or the pedestrian.For the analysis, the primary contributing circumstance has been divided into two groups, pedestrian at-fault and driver at-fault as shown in Table 1.

Model Development
The statistical analysis process requires a number of steps to perform.Each step includes a selection of variable and provides summary statistics to evaluate the model.The results from the step methodology can be interpreted using the following matrix shown in Table 2.The horizontal axis is the predicted values from the analysis and the vertical axis compares the results to the observed condition.The node -Severe x Severe‖ and -Not Severe x Not Severe‖ are cases in which the model is accurately predicting the observed case.When the model is predicting a severe case and the observation is also a severe case, it is concluded that the model is predicting accurately.When the prediction suggests a severe case but the observation is not severe (-Severe x Not Severe‖) no issue is raised because this creates a more conservative prediction and builds in a factor of safety.However, the case that is predicted to be not severe but is observed as severe (-Not Severe x Severe‖) is underestimating the safety of the section.Therefore the combination which produces the least liberal (or most conservative) case is to be chosen as it has the highest factor of safety.
The other value that was involved in the model development task was the overall percentage correct.If multiple steps have the same level of safety, i.e. the lowest amount of underestimated cases, the overall model accuracy percentage is compared to select the most appropriate combination of variables.

Sidewalk Present
Table 3 shows the severe and not severe cases in each step using different variables for locations where there is a sidewalk present.The observed and predicted values are generated and the overall percentage correct shows how accurately the model was predicting the observed condition.The steps are various iterations of combinations of applicable variables in order to determine the best suited combination to most accurately predict the observed case.Using both values identified and Table 3, the model from step 6 was selected for use in this particular analysis.The specific variables and results of the model obtained from the software are shown in Table 4. From the data presented in Table 4, the odds ratios show the odds that a severe crash will occur in the variable data set compared to a reference category.For example, in the light condition, the odds ratio says that a severe incident is approximately 2.26 times more likely to occur in darkness than in daylight.This result makes sense because of the difficulty in drivers seeing individuals walking the evening and during dark periods.This result coincides with the time of day odds ratio that shows a much higher likelihood of being in a severe crash after 7:00 PM in the evening.A similar odds ratio is developed for clear versus rainy weather condition, indicating a much higher likelihood that individuals will be walking when the weather is nice and therefore the potential exposure is greater for individuals to be in a severe crash.For the casual age groups, 25 to 54 has an OR of 3.2, but when looking at the reference category and the other variables, the overall number of drivers in Huntsville from this population are likely influencing the number of severe crashes with pedestrians as the reference category is relatively small, 17 to 24, and the other groups have a larger number of potential drivers than the reference group and tend to drive more miles.For the individuals likely to be involved in a crash as a pedestrian, the most likely age range is 16 to 25.This is also logical as these individuals often take greater risks.

No Sidewalk Present
Table 5 shows the severe and not severe cases in each step using different variables for locations where there is not a sidewalk present.The observed and predicted values are generated and the overall percentage correct shows how accurately the model was predicting the observed condition.The steps are various iterations of combinations of applicable variables in order to determine the best suited combination to most accurately predict the observed case.Using both values identified and Table 5, the model from step 3 was selected for use in this particular analysis.The specific variables and results of the model obtained from the software are shown in Table 6.From the data presented in Table 6, the odds ratio show that the odds a severe crash will occur in the variable data set compared to the reference category.For example, weekends are more likely to have a pedestrian crash when no sidewalk is present.This could indicate that more individuals are walking on roadways without sidewalks during the weekends.In addition, the data show that during even hours, after 7:00 PM, during non-daylight hours when it is not raining, the likelihood of being involved in a severe pedestrian crash are higher.These conclusions make sense conceptually, as the combination of darkness, nighttime and weekend walking without sidewalks all tend to lead to higher severity pedestrian crashes.
The additional factor of cause, pedestrian under the influence, is a contributing circumstance to increase these crashes as the presents of alcohol or drugs impairs judgement and can lead pedestrians to attempt to cross when there is not a sufficient gap to allow a pedestrian to cross the street or encourage safe walking along a roadway.Finally, residential locations where the roadways are curved tended to lead to higher crash severity for pedestrians.Again, this is logical as most individuals walking at night would be near their residence and the curvature of roadway would obscure the vision of the driver to reduce the reaction time to avoid the crash.

Comparison
One difference between the factors for when sidewalks are present versus when sidewalk are not present is that the sidewalk model includes the driver while the no sidewalk present has variable related to the pedestrian.This indicates that the models are assigning different causal units based on the presence of the infrastructure.When sidewalks are present, the crashes are caused by the driver, or at least attributed to the driver by the reporting officer's opinion.Alternatively, when sidewalks are not present, the pedestrian is reported to be responsible for the crash at a much higher and statistically significant level.

Conclusions
This paper examined pedestrian crash characteristics for severe versus not severe crashes for situations when a sidewalk is present and those when a sidewalk is not present.In both instances, higher severity pedestrian crashes tended to occur in the evening hours, during periods of darkness.This conclusion is important because driver education can be introduced to help expose this issue make drivers aware that pedestrians are out walking during evening hours, and not to assume the because of the hour that pedestrians will not be present along the roadways.
In both sidewalk present and no sidewalk present scenarios, males tend to have a higher likelihood of being in a severe pedestrian crash.This may be attributed to the comfort level of males walking during the evening and darkness hours or may be a reflection of the risk taking attitudes, especially when the presence of alcohol or drugs might be a factor.Additionally, walking on curved roadways was seen in both instances to increase severity as the sight distance is limited.Interesting to note, that the crashes tend to be more severe in residential neighborhood when sidewalks are not present during the weekends; indicating that individuals might be more likely to feel comfortable walking without a sidewalk in residential locations than in commercial areas.
Overall, the comparison indicated that the presence of sidewalks does not lead to an extreme difference between the factors that influence the severity of pedestrian crashes in these two case study cities.Generally, pedestrians are more likely to be involved in a severe crash when walking during evening hours when the weather is good and visibility is low due to lighting conditions.

Table 1 .
Primary Contributing Circumstance for Pedestrian Crashes

Table 2 .
Step Model Selection Matrix

Table 3 .
Best Step Option Model, Sidewalk Present

Table 4 .
Results from SPSS, Sidewalk Present

Table 5 .
Best Step Option Model, No Sidewalk Present

Table 6 .
Results from SPSS, No Sidewalk Present