Cross-Sectional and Time Series Data as the Basis for Panel Modelling: The Case of Kidnappings in México From 2010 to 2019

This paper presents the elements entailing the building of a panel data model on the basis of both cross-sectional and time series dimensions, as well as the assumptions implemented for the model application; this, with the objective of focusing on the main elements of the panel data modelling, its way of building, the estimation of parameters and their ratification. On the basis of the methodology of operations research, a practical application exercise is made to estimate the number of kidnapping cases in Mexico based on several economic indicators, finding that from the two types of panel data analyzed in this research, the best adjustment is obtained through the random-effects model, and the most meaningful variables are the Gross domestic product growth and the informal employment rate from the period 2010 to 2019 in each of the states. Thus, it is illustrated that panel data modelling present a better adjustment of data than any other type of models such as linear regression and time series analysis.


Introduction
In the current days, social, economic, financial and biological phenomena, among others, have largely showed complex behaviors mainly due to the structure that data present, which tent to be either cross-sectional data (evaluation of the phenomenon in a certain period of time) and time-series data (evaluation of the phenomenon through time), that is, according to Lavado (2012): Cross-sectional data ; where i stands for a specific moment in time (1) Time-series data ; where t stands for is a specific moment in time (2) Chart 1. Data types Source. Econometrí a de corte transversal (Lavado, 2012) As an example of these types of data, it is found:  The estimation of gasoline prices during the period 2000-2018, taking as a reference the crude oil price and the economic growth in such period.
 The growth of a plant during a period of 125 days as of the quantity of the fertilizer and the water applied, as well as the amount of time of exposure.
 The variation of global temperature in the last 150 years as of greenhouse gas emissions and economic growth.
In the view if these phenomena, the main goal of this paper is to present the elements enclosing panel data models, its way of building, the estimation of its parameters and its ratification. To fulfill this goal, it is presented a practical application to estimate the number kidnapping cases in Mexico from 2010 to 2019 taking a frame of reference different economic indicators such as Gross Domestic Product (GDP), economic growth, unemployment rate and employment informality rate in each of the Mexican states.
which meets the following criteria (Ackoff & Sesieni, 1977):  Phase I: Mathematical formulation: lineal association is set out between dependent and independent variables.
 Phase II: Model estimation: the estimation of the parameters is proceed, as a result, two models were tested: the fixed-effects model (FEM) and the random-effects model (REM); by means of hypothesis testing, the higher adjustment model is selected. Besides, the significance level is estimated.
 Phase III: Model validation: to validate the model it is required to fulfill the following assumptions: residuals normal distribution, homoscedasticity, non-collinearity among the independent variables.
 Phase IV: Interpreting results: once the model has been validated, the interpretation of the parameters is continued, as well as, the projections of the phenomenon.
For the application of this practical exercise, R-studio software was implemented now that its programming language assisted on obtaining a more efficient outcome.

Theoretical Background
Panel data models are presented when the information of the phenomenon is found over time to a sample of individual units, in other words, if there is a variable Y it in which i = 1,2, 3…, N observed objects over t= 1,2,3…t periods of time (Arellano, 1991): AZ t 1 a n-2 b n-2 AZ t 2 a n-1 b n-1 AZ t 3 a n b n Source. Modelos de Datos Panel (Albarrá n, 2010) According to Lavado (2012), its mathematical expression is: is the expected value of the phenomenon under study of the object (i) at an specific point in time (t).
 X it is the independent variable which may affect the behavior of the phenomenon under observation of object (i) in a specific point in time (t)  u it is the margin of error that cannot be explained because of the lineal association between Y & X.
 B j; j = 0 & 1, are the parameters to estimate through the method of least squares 1 The main purpose of the panel data models is to capture non-observable heterogeneity, and that is not taken into consideration in the traditional regression models which may cause negative effects in the estimation of the phenomenon under study. Panel data models are classified into Models of Fixed Effects (MFE) and into Models of Random Effects (MRE).
On MFE, it is assumed that the differences among the objects of study can be captured through the differences in the constant term, which are deterministic. Accordingly, to Baronjo & Vianco (2014): cov (X it, , Z i ) ≠ 0 (4) such that: (5) where:  i is the sub index that represents a column vector of the number one.
The issue with this method is when a large-size sample is presented which tends to void the object effect handling the variables in deviations with regard to the temporal mean of each object; as a consequence, this prevents analyzing the effect of the invariant variables in the time.
Referring to MFE, it is considered that the individual effects are not independent among them, since these are randomly distributed of a given value. In these models, it is contemplated not only the impact of the independent variables but also the specific features of each cross-sectional unit. In accordance to Baronjo & Vianco (2014), the models are demonstrated: 1. The Method of Least Squares consists in minimizing the sum of the squares of vertical distances between the data values and the estimated regression. Reducing the residual sum of squares, having as a residual the difference between the observed data and the values of the model (Mendenhall, Wackerly & Scheffer, 2008). where:  u i is the random disturbance that allows distinguishing the effect of each individual in the panel.
For the purpose of its estimation, stochastic components are grouped so that the outcome, in respect to Torres (2007) is: According to Labra& Torrecillas (2014), it is assumed that the condition of the individual effects is not correlated with the independent variables in these models.
such that:  B i are the individual effects  X are the independent variables For decision-making purposes about the model of better adjustment is used the Haussmann's test which consist of comparing the ′ obtained through an estimator of both models MFE, and MRE, whose aim is to identify if the differences among them are or not meaningful. On the basis of the foregoing, the hypothesis statement is the following (Ramoni & Orlandoni, 2013): :^ ≠^ ≠ → If the P-value is higher to the significance level ( ) is not rejected (Ho). There is no correlation between the individual effects and the independent variables, in other words, the random estimator must be used. Once the model with the best adjustment is selected (MFE vs MRE), this must fulfill with the following assumptions according to (Molina, Rodrigo, 2010):  The residuals must be close to a normal distribution: ( ) ≠ .  If IFV (Inflation Factor of Variance) is higher than 5 units, there is a high co-linearity. In the light of the foregoing, conducting an exercise is proceeded, in which all the tools are encompassed to the construction of the panel data models that begins from a descriptive analysis of the information until the fulfilment of the model of better adjustment.

Construction of the Model
One of the main problems that Mexican society faces is insecurity. Such phenomenon has had an accelerated growth. Within its guidelines, kidnapping has been one of the most fraudulent practices. Based on the Mexican Legal Dictionary, and from the point of view of penal judicial, this activity is defined as the following (Cá mara de Diputados, 2019): "The seizure and retention of a person for the purpose of ransom in money or in goods, and it is used as a sign of plagiarism" The studies have showed that from 1970 to 1984, Mexico presented very low numbers of kidnapping (300 cases). After this period, this activity has strongly been accelerated, so much that in 2012, the Public Security Bureau reported more than 1,117 cases, and in the year 2019 the same dependency determined more than 1,206 cases (Yam, Trujano, 2014).
In respect to World Bank, developing countries that manifest this illicit activity have proved economic indicators very unfavorable for their populations, thus they are characterized for having high unemployment rates, low economic growth, a very weak tax collection, a high informal economy rate, having as an effect the worsening of human capital (Gonzá lez, 2012).
In this context, it is aimed at the estimation of the degree of incidence that these economic variables have in relation to the kidnapping rate each Mexican state during the period of nine years, from 2010 to 2019, that means: Where:  Y Sjt is the number of cases of kidnapping in the j-th entity in the time.
 X 1jt is the Gross Domestic Product in millions of Mexican pesos in the j-th entity in the time  X 2jt is the Economic growth rate in percentage terms in the j-th entity in the time.
 X 3jt is the Employment Informality Rate in the j-th entity in the time.
 X 4jt is the Unemployment Rate in the j-th entity in the time.
 X 5jt is the time elapsed in the j-th entity in the time.
Through the identification of the variables that have theoretically impact on such phenomenon, the estimation of the dynamic is carried on; consequently, a mathematical formulation has to be done, the estimation of the parameters, the validation of these, and finally, the interpretation of the results.

Mathematical Formulation
From the foregoing, the model to estimate is: When comparing both models, it can be observed that the Fixed-effects model has the variables X 1jt (GDP) and X 4jt (Unemployment Rate) are meaningful to 0.05. On the other side, the Random-effects model, the variable X 1jt is meaningful. However, the random-effect model presents a higher adjustment due to it has a coefficient of determination (R 2 ) greater than the fixed-effects model. Therefore, MRE is more suitable to predict the dynamic of the cases of kidnapping in Mexico.
Chart 5. Hypothesis test to choose the model with the best adjustment Source. Own elaboration The information above can be confirmed through the Haussmann's test (chart 5). It can be stated that the model of random effects is the most appropriate because its level of significance is lower to 0.05, and this shows a better adjustment.

Estimation of the Model
With respect to the random-effects model, eliminating the variables that are not meaningful and making the information symmetrical, the regression analysis would be the following: Having run the data, we can obtain that: Chart 6. Model of better adjustment

Source. Own elaboration
Replacing in equation 11, it is shown that: With a level of confidence of 0.95 and with a level of significance of 0.05, equation 12 conserves 31.05 percent of the variability of the data, that implies that equation explains a 31.05 percent of the dynamic of the cases of kidnapping in Mexico.

Validation of the Model
Taking as a reference equation 12, it can be proceed with the validation of the model through the fulfillment of the following assumptions: Chart 7. Fulfillment of the assumptions

Source. Own elaboration
With the implementation of the chart 6, it can be appreciated that the estimated model to be applied fulfills all the assumptions: its residuals are close to a normal distribution, are homoscedastic and are not correlated; with a level of significance to 0.05, the P-value of each of parameter is found above (

Interpretation of Data
In respect to equation 12, the backward equation is the following: In accordance to this equation, the interpretation of its parameters is the following:  Per each $1000 pesos increased in X 1jt (Gross Domestic Product), an additional case of kidnapping will be presented in the country, remaining constant the rest of the variables:  Per each percentage unit that rises in X 2jt (Employment Informality Rate), 9 cases of kidnapping will be presented in the country, remaining constant the other variables: With this model there is enough evidence of the produced effects of the Gross Domestic Product and the Employment Informality Rate, having found that the Employment Informality Rate shows a higher incidence. Furthermore, to this model, the time is not an element which determines the behavior of the phenomenon under study.

Summary
As observed, the construction of a panel data model involves having the information of both cross-sectional and time series data, in which the aim is to estimate the dynamics that presents a phenomenon of these features, which often presents difficulties to be modelled through the lineal regression and time series analysis.
Some of the bounties that this type of models present is to estimate heterogeneous objects, which cannot occur with lineal regression (manages the information of homogeneous way) and time series analysis (depends on the asymptotic properties of the temporal dimension, for which they need to have an enough number of observations), having as an effect the decrease of the adjustment of the information.
With Panel Data Models the most erroneous information of the phenomenon is captured, in other words, it collects observations about multiple objects of the phenomenon under study over specific periods of time.