Assessing the Risk of Road Traffic Fatalities Across Sub-Populations of a Given Geographical Zone, Using a Modified Smeed’s Model

Smeed (1949) provided a regression model for estimating road traffic fatalities (RTFs). In this paper, a modified form of Smeed’s (1949) model is proposed for which it is shown that the multiplicative error term is less than that of Smeed’s original model for most situations. Based on this Modified Smeed’s model, Bayesian and multilevel methods are developed to assess the risk of road traffic fatalities across sub populations of a given geographical zone. These methods consider the parameters of the Smeed’s model to be random variables and therefore make it possible to compute variances across space provided there is significant intercept variation of the regression equation across such regions. Using data from Ghana, the robustness of the Bayesian estimates was indicated at low sample sizes with respect to the Normal, Laplace and Cauchy prior distributions. Thus the Bayesian and Multilevel methods performed at least as well as the traditional method of estimating parameters and beyond this were able to assess risk differences through variability of these parameters in space.


Introduction
proposed a model for estimating road traffic fatalities (RTFs) in his paper.He showed that the formula   (were D = Number of RTFs, P = population size and N = number of vehicles in use) gave a fairly good fit to the data from 20 countries, including European countries, USA, Canada, Australia and New Zealand.Ponnaluri (2012) used data from all states in India to develop seven different models for predicting RTFs and also examined if the individual models were more relevant for application.The seven models, including that of Smeed's, were tested for fitness with the actual data.Smeed's model was found to give the best fit.He showed that the original Smeed formulation cannot simply be discounted due to reasons cited by many researchers.This is because Smeed's model is parsimonious in parameter usage.According to Ponnaluri (2012), Smeed's model appears to be observation-driven, evidence-based, and logically valid in measuring the per vehicle fatality rate.
The predominant factors affecting RTFs are not the same as those of road traffic accidents (RTAs).Exposures to risk of RTFs (such as human error, environmental/weather, nature of the road and condition of vehicle) are predominant factors influencing road traffic accidents within a geographical region.However, the rate of RTFs is determined by vulnerability to risk (such as insufficient ambulance and emergency medical services, improper pre-hospital care for RTA trauma patients, inadequate safety mechanism in vehicles).
Exposure to risk of RTFs and vulnerability to risk of RTFs are not correlated.Thus, high exposure does not necessarily imply high vulnerability.For instance, Greater Accra Region in Ghana, with the highest exposure to the risk of RTF (due to high population and vehicular densities), has the lowest RTF rate among all the other 9 regions in Ghana.Whilst the three Northern regions of Ghana, with the lowest population density have the highest rate of RTFs (Hesse and Ofosu, 2015).Nigeria and Ghana have almost the same vehicular density.However, inhabitants of Nigeria are more vulnerable to die as result of road traffic accidents.Developing countries, with only about 10% of the world motorization, account for about 85% of annual RTFs in the world (WHO, 2004(WHO, , 2009)).Thus, developed countries, though have greater exposure to risk of RTFs due to high vehicular density, however less vulnerable to RTFs compared to developing countries.
Two predominant factors that determine risk of RTFs in a geographical region are (1) Safety mechanism in vehicles (such as anti-lock braking systems (ABS), air bags and seatbelts), (2) Emergency medical services (such as Ambulance service).
One reason why developing countries are more vulnerable to risk of RTF is due to the fact that a large proportion of road traffic accident trauma patients in these regions do not have access to formal emergency medical services (Tiska, et al., 2002).Secondly, the ages of vehicles and availability of modern safety mechanisms in vehicles plying the roads in these regions have significant effect on the consequences of road traffic accidents.It is obvious that if greater attention is paid on improving road safety mechanisms (such as anti-lock braking systems (ABS), air bags, better design of cars and increased wearing of seatbelts in cars) there could be substantial benefits in reducing injuries and fatalities with respect to road traffic accidents in developing countries (Hesse, et al., 2014).The factors affecting RTAs correspond to exposure X while the factors affecting RTFs correspond to vulnerability given the same exposure.In Smeed's model exposure is measured by the variable X whereas vulnerability for a given X is captured by the parameters  and . then the different values of Y is not based on X but is due to the fact that  and  vary across the two geographical regions.It therefore follows that, the parameters of Smeed's model vary from one geographical region to another.Thus, one could use these parameters to assess variability of the risk of RTFs across geographical regions.Smeed (1949) and other related studies by Ponnaluri (2012), Ghee et al., (1997), Bener and Ofosu (1991), Jacobs and Bardsley (1977), Fouracre and Jacobs (1977) used least squares regression (LSR) method to estimate the parameters.However, the LSR approach:  does not allow the variability of the parameters,  is very sensitive to violation of the normality assumption.
Thus, we need an estimation method that: (1) is robust with respect to the assumptions of the model, (2) could be used to estimate the variance of the parameters across geographical regions, (3) enables us compare the risk of RTFs across the geographical regions.
As a general objective, therefore, this study aims at developing statistical methodology, based on Smeed's model, for assessing the risk of RTFs across sub-populations of a given geographical zone.The first specific objective is to develop a modified Smeed's model.Secondly, based on the modified Smeed model, the study seeks to develop and use  the Bayesian analysis approach to derive an estimator, based on a prior distribution that is robust with respect to the normality assumption,  the multilevel analysis approach to compare the risk of RTFs across geographical regions.
Finally, the study seeks to use data from Ghana to validate the developed method and to assess the robustness of the model.

Method
Another possible linear transformation of Equation ( 5 can also be used as risk indicator of RTF.This is in sync with the general objective of this study (see Hesse & Ofosu, 2014).

Bayesian Approach to Estimation of Regression Parameters
In this Section, we develop, using the modified Smeed model, a Bayesian approach to derive an estimator, based on a given prior distribution, that is robust with respect to the normality assumption of the model.
It is assumed that the unknown parameter vector is a value of some multivariate random variable with a multivariate prior distribution.The range of possible values that the regression coefficients 01 , , ...
 can take is -∞ to +∞.Thus, the largest possible domain of the prior distribution is the set of all real numbers.This limits us to distribution which can take both negative and positive values.Therefore, the most suitable prior distributions are the bivariate Normal, Laplace and Cauchy distributions.
Two Bayesian methods were used in estimating the parameters in Equation ( 9).These are the 'conjugate prior' method 1 National Road Safety Commission of Ghana (2011).Building and Road Research Institute (BRRI), Road Traffic Crashes in Ghana, Statistics and the maximum a posteriori method which are discussed in the following sequel.

Conjugate Prior
In this section, we assume that the random variable , Y with components , i y in Equation ( 9), has the normal distribution with mean  β x and variance 2 . Thus, the likelihood function will also follow a normal distribution.Since the normal distribution is conjugate to itself (or self-conjugate) with respect to a normal likelihood function, choosing a bivariate normal prior over β will ensure that the posterior distribution is also normal.The conditional p.d.f. of Y is then given by .Σ Thus, the p.d.f. of β is where

Σ
The posterior distribution can therefore be expressed as The function under the exponent in Equation ( 13) can be written as where v is the constant term, independent of .The estimate of the standard error of the th i coefficient, based on the Bayesian estimate is the square root of the th i diagonal elements of ˆ. β Σ

Maximum a Posteriori Method
The goal here is to find the parameter estimates that maximizes the posterior probability of the parameters given the data.This corresponds to We resort to sampling techniques, such as Markov chain Monte Carlo (MCMC), to get samples from the posterior distribution.The following algorithm is the description for the multivariate Metropolis Hastings procedure (Steyvers, 2011):

Multilevel Random Coefficient (MRC) Model
In this Section, we develop a Multilevel Analysis approach to estimate the regional distribution of parameters based on the modified Smeed's model and use them to compare the risk of RTFs across geographical regions.
The parameters to be estimated are If  0 differs significantly from 0, then the parameters of the modified Smeed's model can be used to compare the risk of RTFs across the J geographical regions.
Equating the partial derivatives of the likelihood function to zero, we obtain the maximum likelihood estimators of the parameters 2 01 , , , ( ) and ) and l lr lr     respectively.

Validation of Method Using Data from Ghana
In this section the study seeks to use data from Ghana to validate the (1) Bayesian method and to assess the robustness of the model (2) multilevel method and to compare the risk of RTFs across the 10 geographical regions.

Validation of Bayesian Method (i) Conjugate Prior Method
Table A2, in the Appendix, gives the estimated population size and the number of motor vehicles and road traffic fatalities in Ghana (1991Ghana ( -2012)).It can be seen that, the distribution of ln( ), DP with a Shapiro-Wilks normality p-value of 0.201, is closer to the normal distribution compared to that of ln( ) D with a corresponding p-value of 0.086.This confirms that the logarithmic transformation in Equation ( 8) is preferred.
The 19 jackknife sample estimates of  0 and  1 , based on the national data, derived from the values of i y and i x in Table A2 are given in Table A3.Based on Equations ( 19) and ( 20), jackknife estimate of the mean vector and covariance of the random vector β is computed as follows  2, that the estimated coefficients  0 and  1 , are almost the same for the least squares and the conjugate prior methods.Both methods also reported the same coefficient of determination 2 .R The conjugate prior estimates recorded comparatively very small standard errors; making the conjugate prior method preferred.

(ii) Maximum a posteriori method
Our objective here is to determine the parameter estimates that maximize the posterior distribution given the data with respect to the bivariate Normal, Laplace and Cauchy prior distributions.

Bivariate Normal prior distribution
The prior distribution in Equation ( 12) can be written in terms of  as The Metropolis Hastings algorithm, above, is used to estimate the values of 0  and 1 . The MATLAB code for the implementation of component-wise Metropolis sampler for the posterior distribution is as given in Listings 1 and 2 in the appendix.
Table 2 shows estimated values of 0  and 1  based on least squares, conjugate prior and maximum a posteriori methods.The results show that the estimated coefficients of 0  and 1  are almost the same for the least squares, conjugate prior and maximum a posteriori methods of estimates.
 Thus, the posterior distribution can be expressed as Using the above algorithm, the maximum a posteriori estimates of  0 and  1 to be -8.320085 and 0.317051, respectively, with standard errors of 0.039047 and 0.010450.
Thus, the posterior distribution can be expressed as The component-wise Metropolis-Hastings sampler for the posterior distribution based on the MATLAB codes, gave maximum a posteriori estimates of  0 and  1 to be -8.312857 and 0.317400, respectively.
The resulting posterior Bayesian estimates for the Normal, Laplace and Cauchy prior distributions are summarized in the Table 3.Given a sample size 19, the posterior Bayes estimate is reasonably consistent for the Normal, Laplace and Cauchy prior distributions.4 shows the posterior Bayesian estimates of  0 and  1 at four different sample sizes (5, 10, 15 and 19) using the Normal, Laplace and Cauchy prior distributions.It can be seen that, at sample sizes of 5 and 10, the posterior Bayesian estimates of  0 and  1 are not consistent across the three prior distributions used.Thus, the estimated values of  0 and  1 are said to be sensitive with respect to the prior distribution.At a sample size of 15 or more, the model becomes insensitive to the prior distribution.The relative influence of the prior distribution decreases while that of the data increases with a sample size of 15 or more.It can also be seen that the posterior Bayesian estimate is reasonably consistent for the Laplace prior distribution across all four sample sizes used.Even at a sample size of 5 where the normality assumption was violated, the estimates based on the Laplace prior distribution was robust.Thus, the Laplace prior distribution is preferred when the sample size is small.Table 4. Bayesian estimates with respect to sample size and prior distribution

Sample size
Normal Laplace Cauchy for the ten regions of Ghana.Instead of estimating a separate regression equation for each of the 10 regions in Ghana, we wish to determine a single model for estimating regional distribution of RTFs.The collection of the regression parameters { 1 ,  2 , …,  10 } is assumed to be a random sample of size 10 taken from a population whose distribution depends on the parameters  1 ,  2 ,  0 ,  0 ,  1 ,  01 and  2 , where  j = ( 0j ,  1j ), j = 1, 2, …, 10.Equations ( 21), ( 22) and ( 23) can be written as 01 0 00 01 0 1 10 1 , , .
Three models are considered in the next section.

(i) The Unconditional Means Model, M 0
An unconditional means model does not contain any predictors, but includes a random intercept variance term for groups.
In this section, we examine if there will be significant intercept variation 0 ( ).  If  0 does not differ significantly from 0, there may be little reason to use random coefficient modeling since simpler Ordinary Least Squares (OLS) modeling will suffice.Equation ( 34) therefore becomes Therefore 00 0 = .
Application of the nlme package in R, using data in Table A3, shows that there is significant intercept variation in terms of y scores across the 10 regions.
The maximum likelihood estimates of the parameters, using data from Table A3 and nlme package in R, are given in Table 5.
(iii) Random slope model M 2 In section, we continue our analysis by trying to explain the third source of variation, namely, variation in the slope, 1 . The model that we test is:  The last two columns of Table 7 give the means of RTFs per 100 accidents and RTFs per 100 casualties for each region from 1991 -2009.This implies that the risk of dying as a result of road traffic fatality in Greater Accra is relatively low, recording an average rate of 5.7 road traffic fatalities per 100 accidents.Thus, out of every 100 road traffic accidents in the Greater Accra, about 6 of the victims are likely to die (Hesse and Ofosu, 2015).8 show that there is strong positive correlation between the parameter estimates of the modified Smeed's model and the fatality indices.Thus, the parameter estimates  and  of the modified Smeed's model can be used as risk indicators of RTFs in Ghana.Based on the modified Smeed's model of this study, the developed Bayesian method with respect to the Laplace prior distribution was found to be robust to violation of the normality assumption of the model.Using data from Ghana, the sensitivity of the Bayesian estimates at different sample sizes with respect to the Normal, Laplace and Cauchy prior distributions was assessed.At a sample size of 15 or more, the model becomes insensitive to the prior distribution.The posterior Bayesian estimate is consistent for the Laplace prior distribution across all four sample sizes.At a sample size of 5, the estimates based on Laplace prior distribution were robust with respect to violation of the normality assumption of the model.
The parameter estimates of modified Smeed's model can be used as risk indicator of RTFs across geographical regions provided there is significant intercept variation  0 of the regression equation across geographical regions.Using data from Ghana, it was shown that the parameter estimates  and  across the 10 geographical regions can be used as risk indicators of RTFs in Ghana.Thus, the three Northern regions and the Brong-Ahafo region have the highest risk of RTFs.

Table A1 ,
in the Appendix, is an extract from the list of countries with ranks based on the number of road motor vehicles per 1,000 inhabitants.For every country in the world, except San Marino, the number of registered vehicles in use, N, is less than the population size, P. Since NP  for most situations, it follows that the multiplicative error term u in the modified Smeed's model of this study is less than that of Smeed's original model, making the modified Smeed's model preferred. ................................(7)

Table 2 .
Comparison of Coefficients of Least Squares, Conjugate Prior and Maximum a Posteriori Methods

Table 3 .
Posterior Bayesian estimates for different priors with a sample size of 19

Table 5
presents the parameter estimate and standard errors for the models M 0 , M 1 and M 2 .All the standard errors of the estimated parameters in model M 2 are smaller than the corresponding values of model M 1 .Moreover, the deviance, which measures the model misfit, is much lower in M 2 as compare to that of M 1(Hesse, et al., 2014b)Thus, estimate parameters based on model M 2 is preferred.

Table 5 .
Comparison of models M 0 , M 1 and M 2

Table 6 .
Estimate of regional-level residuals and the values of  and  NRSC) 2 of Ghana 2011 report, two key national road traffic fatality indices required for characterization and comparison of the extent and risk of traffic fatality across the ten geographical regions of Ghana are RTFs per 100 accidents and RTF per 100 casualties.

Table 7 .
Parameter estimates and Fatality indices We wish to determine if strong positive correlation exist between the parameter estimates of the modified Smeed's model and the fatality indices based on NRSC definition of risk.The p-values in Table

Table 8 .
Correlations coefficients The multiplicative error term u in the modified Smeed's model of this study was found to be less than that of Smeed's, making the modified Smeed's model preferred.Using data from Ghana, it was confirmed that the modified Smeed's model for this studies, is relatively more accurate in estimating RTFs in Ghana than the Smeed equation.
 has been developed.