A New Methodological Framework for Crime Spatial Analysis Using Local Entropy Map

The highest crime rate in major cities has been always a challenge for managers and urban planners. In order to control and reduce the crime rate, different methods have been proposed in recent decades. Considering the relationship between land use and crime, in this study, potential spatial dependency between commercial land uses, banks, bus and subway stations and pickpocketing was investigated in Tehran. To analyze the spatial dependency, local entropy models and nonparametric approach were used. Using ArcGIS ESRI product we created the local entropy maps to show the significance level of each local region, which allows interactive examination of significant local multivariate relationships. The results show a high spatial autocorrelation between mentioned land uses and specified crimes. The parameters indicate a significant cluster distribution. Furthermore, pickpocketing density at the bus stops is at the high Bonferroni level. The results specify that spatial patterns of pickpocketing are related to land use in the study area.


Introduction
Crime mapping and spatial analysis of crime began when police officers marked citizen call positions on the paper map (Harries, 1999).Later, more scientific urban crime geography researches have been initiated since mid-nineteen century.At that era, social ecology assist scientists (especially Quetele and Guerry) to develop the relation between crime and place.In early twenty century, new methods of crime analysis were utilized in the Chicago social ecology academy, especially by Show and McKay.However, back to a few decades ago by fast urban growth and consequently rapid crime rising, there was a high interest and progress in the urban crime spatial analysis.Using Geographic Information System, the spatial crime analysis was developed.Also, some theories such as situational crime prevention, crime prevention through environmental design and defensible spaces were established.Recently, different approaches and methods have been proposed in geostatistics and spatial analysis of crime.
In crime spatial analysis, the crime distribution is influenced by space and urban characteristics such as land use and other infrastructural, social and economic factors.The main goal of this study is to determine fundamental parameters facilitating criminal activities and crime hotspots, and finally to reduce the crime rate.According to social ecology and environmental criminology framework, a number of individual and social interactions within the environment affects human being.In this context, one major issue in the crime spatial analysis is to find proper correlation between crime and urban land use.
Space and time have specific circumstances, which increase the crime probability.Sometimes, land uses and infrastructural characteristics reduce criminal opportunities and illegitimate goals by utilizing situational crime prevention approach.
Tehran city, the capital and the largest city of Iran, has several problems due to the social, economic, structural, demographic and cultural special situations.Previous studies indicate that different burglary types in Tehran, is one of the most important challenges for citizens and authorities.Spatial crime research involves crime occurrence with special-purpose structures and facilities such as high schools, taverns, bars, liquor stores, apartment buildings, and public housing.Bus and subway stops are considered as a crime hotspot where provides a better situation for criminals, which can wait for potential victims without arousing suspicion.Bus stops are crowded by anonymous people, which are the best targets for criminals.In addition, the type of land uses influence on crime such as liquor stores, taverns, pawn shops, pool halls, vacant lots and abandoned buildings, called as "crime generators".
Recent studies indicate that an efficient way to control crime and increase urban security is to study the spatial effect and land use type in crime patterns.Criminal action and appropriate available criminal goals are considered as two major situations influenced by crime land use easy entering and leaving the environment.Numerous studies have also been conducted in the analysis of crime using multivariate spatial and hotspot analysis.Most of these studies have applied spatial statistics or geographically weighted regression (GWR).Meagan Cahill and Gordon Mulligan (2007) used GWR for analysis of violence rate as dependent and heterogeneity index, a number of independent variables such as married families, multiple land use, residential stability, single-person households, and population density in census blocks scale.Geocoded points was aggregated in census blocks polygons so biased data have been generated and Modifiable Areal Unit Problem (MAUP) has been arisen.In this situation, the results were unreliable and blurring.Malczewski and Poetz (2005) studied the relationship between socioeconomic indices and residential burglaries index.Accordingly, the data aggregation approach has been utilized and MAUP was ignored.In other research, spatial and temporal analyses of crime after natural hazard events have been studied.They first detected clusters of crime incidents and then exploratory analysis of possible causes for detecting clusters was applied.So previous problems yet exist.
The aim of this study is to formalize a methodological framework in crime spatial analysis.In this context, we first applied a procedure to alter raw data to a reliable data.We used regionalization approach to do first phase.Afterwards, the data was used to extract multivariate spatial relationships.We focused on a particular type of urban setting -the subway and bus stop-and commercial land use to elucidate the environmental factors that may create opportunities for crime at bus stops.
Understanding a fact that criminal activities do not occur in random or unpredictable locations are an important factor in spatial analysis of crime.Actually, crime occurs in certain structures that are influenced by the landscape in which it occur, and the psychological characteristic of the offender.Routine activity theory tells us that a crime requires three criteria, the availability of motivated offenders, a potential target (or victim), and the absence of a capable guardian.Suitable land use and appropriate time could considered as the two additional factors.
Studying the relationship between crime and urban land use has a long history in environmental criminology research.Different urban land uses attract specific types of people to particular parts of the city.Therefore, the land use policies can be analyzed to explain why specific crime rates are high in one part of a city and low in another.Commercial land uses place potential criminals and targets in proximity, so they are associated with higher crime.This type of urban land uses has a great amount of public space and the large number of strangers, so it is difficult to maintain informal social control.
To analyze the relationship between the different land uses and crime occurrence in the study area, the local entropy approach was used.We tried to study the correlation among bus, subway stations and bank branches with pickpocketing and specify the pickpocketing hotspots.The further aim of this study is to determine the efficiency of local entropy in analyzing social behaviors and anomalies with urban spaces.To this end we compared entropy outputs with GWR models output which is a most preferred model in analyzing multivariate relationships of crime relevant variables.

MAUP and Regionalization
Any type of geographic data analysis defining the primary spatial unit for spatial analysis (in other words scale of spatial analysis) has direct effect on the results.Key issue in spatial studies is the exact definition of these primary areal or spatial units.Necessity to define the basic unit of study is that for instance if once we examine the relationship between income and crime rate in neighborhood unit and again in the level of urban area, different results would be obtained.This problem originates from the fact that raw data (e.g.census) are released for census blocks.Therefore, we have to aggregate this data into neighborhood units and urban areas or any other basic unit for our study.Since the choice of these basic units does not have any rule and is arbitrary, different results will be obtained.
Figure 1 as a basic example shows how MAUP affects the results.Consider each cell as a primary spatial unit and two different aggregation sachems applied to these cells.When data aggregated as scheme 1 correlation coefficient, R 2 was increased considerably and in scheme 2 R 2 was increased again.In fact, results of spatial analysis are dependent on the scale at which analysis is done.This problem was first identified by Gehlke and Biehlin as modifiable areal unit problem (MAUP).
This problem was addressed mainly by approach of zone design.Zone design methods (such as Max-P Region) have could be used for making regions, zones etc. by aggregating and interchanging basic spatial units with optimization of objective functions.These regionalization algorithms have some basic features like: all of them aggregate basic spatial units into predefined number of regions with optimization of an aggregation function, basic spatial units which assigned to a region must be spatially connected, the maximum number of regions should be one less than basic spatial units, a basic spatial unit could be aggregated to only one region and minimum number of basic spatial units which should be assigned to a region is one.Another characteristics of them is their supervise capability.That is because you should define relevant variables, number of regions and type of objective function.Zone design methods also have been called as Spatially Constrained Aggregation Methods or Spatial Clustering or Regionalization.Max-P-Region algorithm is a spatial constrained regionalization method, which groups n primary units to unknown number of regions, while optimize homogeneity of variables for each region and maximizing intra-area homogeneity.It applies a threshold of minimum population for each region by creating maximum number of spatially connected regions.It was first designed by Duque et al. (2012) with a heuristic and integer programming approach.One of its useful features is that it captures data characteristics to shape new region limits and not imposing a compactness constraint on new regions.This characteristic is most appropriate for our consequently described model (local entropy model) as it has a detective approach and tries to discover hidden patterns on spatial data.Another useful feature of Max-p Region algorithm is that it does not require defining number of new regions.This leads to defining new regions based on merging minimum number of primary spatial units.Consequently, the number of new regions could be high in comparison with another algorithm on the same data.In this situation the loss of data is minimized and aggregation bias in data decreased so statistical inferences on new regions are most reliable.

Multivariate Spatial Analysis
Resolving complex spatial problems often, needs understanding scientifically how dependent and independent variable affect each other over land surface?Multivariate spatial analysis of crime and socioeconomic, demographic and environmental factors leads to detection of pattern in space.There is a high volume of literature in this context named as Exploratory Spatial Data Analysis (ESDA) paradigm.ESDA is a collection of methods which could help in visualizing spatial data, discovering crime clusters in space and time, detection of abnormal locations or spatial outliers, abd discovering and analyzing patterns of spatial associations.Geographically Weighted Regression (GWR) model is almost the mostly used model for analyzing multivariate relationships and this model is more efficient than others in detecting multivariate relationships.It was first introduced by Brunsdon which allows parameters values calculated and mapped for each location in space as a linear regression model.These models assumption in discovering spatial pattern is that the relationship form is always same in all the locations and they try to fitting trend line to data.This approach also suffers from outliers.However, in real situation, the relationship form is not same across space and dependent and independent variables may have different types of relationships in every area.Therefore, a new model, which is not based on this assumption, is designed by Guo (2010) which could detect spatially varying different form of relationships.This model is a detective model and not a predictive so it unable to give us a relation ratio.However, it could discover relation with high accuracy and confidence level.
Entropy-based approaches are widely used in various fields of science and the concept of entropy is from thermodynamics and statistical physics.Entropy represents the uncertainty and randomness of a system or phenomenon.Shannon entropy and its generalized model, the Re´nyi entropy are from measuring indices.According to the theory, for every real number d-dimension in R data space, Re´nyi entropy is calculated as follows: Where x is a d-dimensional vector, f x is the probability density function and λ ≥ 0 is the Re´nyi entropy level.The main challenge in using the entropy in exploratory analysis is the unknown density function.
Another method is the use of minimum spanning tree (MST).Steele showed that the total length of minimum spanning tree of multivariate data points is associated with the unknown density function.Therefore, it is possible to use MST instead of probability density function to estimate Re´nyi entropy for a multivariate point dataset.The approach continues as follow: (1) Estimating the multivariate Re´nyi entropy for each local neighbourhood.
(2) Using a permutation-based approach to construct an empirical distribution of entropy values for each local neighborhood under the null hypothesis, which is used to convert the local entropy value to a P-value.
(3) Processing the P-values with a set of statistical tests, including the Bonferroni, Bonferroni adjusted for spatial dependence, and FDR.
(4) Creating a local entropy map to show the significance level of each local region and allow interactive examination of significant local multivariate relationships.
The described features of Max-P Region algorithm with detective approach of local entropy model is an efficient framework in discovering hidden spatial pattern of crime.

Study Area & Data
The case study is located in Tehran, the capital and political, cultural, commercial and industrial center of Iran, (Figure 2).The city has grown from a population of just more than half a million in 1940 with a population of approximately 8.3 million in 2014.Tehran has expanded from a 4.2 square kilometer in 1727 to over 868 square kilometers at present.Currently, it is divided into 22 districts and is the biggest and the most important city in the urban system of Iran.
The data used in the study include all pickpocketing crime events occurred in Tehran.Pickpocketing crime point data were acquired from the Tehran 110 Police Department.In Iran, 110-call is the unified emergency number for the public to report crime and access police service.It is the primary resource of official crime information.The current dataset includes crime events between March 1st, 2008 and February 29th, 2009.Also, latest available information and required data layers were obtained from the Municipality of Tehran.
Figure 2. The study area

Methodology
There are real units in literature named as analytical regions, which are a good solution to overcome many problems.These are homogenous spatial units, which were optimized to conduct preferred spatial reasoning procedure.In other words an analytical region is a region which composed of a set of connected small areas and is homogenous in terms of predefined factors like income, number of crime incidents etc.
In this research, our approach has two main parts.The first part includes transforming raw data to reliable information.Initial data include pickpocketing incidents, bank locations, bus stations, subway stations and commercial parcels.Commercial parcels were polygon and point respectively, so we created a grid map using spatial toolboxes of ArcGIS ESRI product to aggregate initial data to one layer.The cell size of this grid map was defined using mean distances average in nearest neighbor function calculated for initial layers.The cell resolution was determined by the area of a circle with radius equal to that average.After creating a grid map with 2890 cells, initial data (bank, bus stations etc.) was merged to a grid map.This grid could not represent the main pattern of spatial distributions solely because many small polygons do not have the same amount of features.For example, the standard deviation for pickpocketing incidents on the grid equals to 2.11 and the number of incidents ranges from 0 to 24.In addition, 67% of cells have no pickpocketing occurrence.This situation is the same for other variables.Heterogeneities in this grid map is much higher than performing a spatial multivariate relationship.The base blocks were generated using above-mentioned algorithms having more homogenous and statistically stable to scale and heterogeneity problems.To create analytical regions, first variables in the grid were calculated using by the density of point features in each cell.In this research, we used area instead of count for commercial land use, and only those variables with autocorrelation, could be used in regionalization so variables were tested using Global Moran's I index for being spatially clustered.Figure 3 shows research workflow diagram of this study.

Results and Discussion
Based on proposed method, the first step is identifying crime clustering.Table 1, illustrates the Pickpocketing, different land use density and their spatial autocorrelation test results.According to the table, there are a number of clusters within the given distribution.In the second step, we used the ClusterPy package to apply Max-P-Region algorithm with a threshold of 10-pickpocketing incident in each analytical region based on an experimental finding.In the final step, we generate 186 analytical regions, including analyzing spatial multivariate relationships of dependent variable pickpocketing incidents and independent variables with respect to crime hotspot map.The Multivariate relationship maps were created using the local entropy map model, local entropy software (Note 1) and ArcGIS.Two important parameters include number of neighboring and magnitude effects ranging between 35 and 75 for the number of neighbors and 0.05 and 1.5 for the magnitude effect.There is no predefined procedure to determine these parameters, but the results and spatial patterns are stable.
Figure 4 represents the pickpocketing hotspots between independent and dependent variables and the significant multivariate relationships at three different levels include the higher level, the stronger and statistically justified relationships.It shows four pickpocketing hotspots exist from north to south and continue along the east part of the city.The southern and western hotspots are almost apart and the central and northern hotspots are along each other.

Figure 4. Hotspot analysis of pickpocketing incidents on grid map and analytical regions
The relationship of the pickpocketing with bus stops is illustrated in Figure 5.The southern hotspot is at the high Bonferroni level, meaning a strong correlation between pickpocketing and dense bus stop areas.In addition, there is the same structure in the eastern part of the study area, with an adjusted Bonferroni level with a 0.99983% confidence.

Figure 5. The spatial relationships of pickpocketing and bus stations density
The correlation between the independent variable pickpocketing and commercial land use is represented in Figure 6.The correlation is strong in northern and southern as well as central parts which the latest one has Bonferroni significance level confidence.The central part has the same pattern of hotspot analysis of pickpocketing incidents on grid map and analytical regions as Figure 4 shows.However, the north-south part has no relationship to the same area in Figure 4 and has an overlay with a region that has a reverse crime distribution pattern.Therefore, in a very small central part, there is a meaningful correlation between pickpocketing and commercial land use density.There is a spatial pattern between pickpocketing and bank branch distribution (Figure 7).There are three hotspots indicating correlation in the mentioned variables with a significant Bonferroni level.The eastern and central zone has a low correlation with the crime events and bank branch distribution pattern.Totally, there is a meaningful correlation between pickpocketing and bank branch distribution with 0.994% confidence.
There are three zones showing a correlation between subway stations and pickpocketing (Figure 8).The northern part is similar to the same zone in Figure 4 with 0.999% confidence.Furthermore, another hotspot patterns have a strong correlation with Figure 4.  To assess the model validity with the same datasets, we compare the local entropy with GWR model.We used the ArcGIS ESRI product, Adaptive Gaussian Kernel toolbar and AICc for the calculation of bandwidth neighborhood effects.
As illustrated in Figure 10, GWR detected the local R 2 between bus station density and pickpocketing with the value of 0.60 in the western part.This location is an extent which, there is cold spot of pickpocketing in comparison to Figure 4, so this detected relationship is not meaningful and we cannot infer a hypothesis about variables in that part.It also detected a local R 2 value of 0.30 between pickpocketing and metro station density variables in city center nearby.This detected relationship in that area is located in hotspot of pickpocketing extent so this could be inferable that there could be a relationship between variables but this relationship (local R 2 value of 0.30) has very low potency to declare that there is a relationship.The GWR model cannot indicate any other meaningful relation between independent and dependent variables.Therefore, GWR model in comparison to local entropy model could not detect a meaningful relationship while local entropy model detected a relationship between dependent variable and independent variables with high significance level.
Figure 10.Model viability assessments with GWR Based on the results, the proposed model has a number of cones and pros.The first advantage of the local entropy is that the analysis is based on the specific spatial conditions of land use type and crime.Conversely, in GWR model, the analysis of building block is based on census blocks.Furthermore, the local entropy model does not have the MAUP problem, which causes inconsistency.Finally, we use analytical regions based on the grid map fishnet network, which can be considered as an advantage of this model.The disadvantage of the proposed model is that although it discovers the relationship but it is a detective method and is not calculating the real correlation value.The result is based on the existence of the correlation.Moreover, to increase the accuracy and precision of the applied model, field data is necessary, but according to the research scope, we just rely on specific scales.Consequently, due to some data limitations for small scales based on security reasons, it is wise to compare and analyze the model in different spatial scales with different existing scales.The model is suitable at this scale; we suggest preparing proper datasets to increase the accuracy.

Conclusion
In this study, we formulated a detective approach in crime multivariate spatial analysis.We first applied a regionalization approach to create analytical regions using Max-P Region algorithm.Afterwards, we used a non-parametric approach to investigate spatial correlation between urban land use and crime occurrence.In this context, bank branches, bus stations, and subway stations were analyzed as land use types and pickpocketing was selected as the study crime.We aimed at assessing the geographical environment and spatial conditions that effect criminal activities and explore the correlation of each land use type with pickpocketing as the crime type in this study.
We used the local entropy model in this research.The model has an approach with a better performance in exploring variable correlation in comparing with similar models such as GWR.GWR model give us a correlation ratio based on one form relationship hypothesis but as we described in analytical background, this assumption is not correct in every neighborhood.Also we don't know how much the fitted correlation value is accurate in locations according to high residual errors.The detective ability of local entropy model in addition to analytical regions made by the Max-P Region algorithm is a novel approach in crime analysis exploration.The use of analytical regions decreases the effect of MAUP problem and leads a better model conformity.
This study also presents a proper explorative research framework.The used approach indicates a meaningful correlation between pickpocketing and mentioned land uses and there is a significant correlation in some regions.According to the detected correlation between independent and dependent variables in several regions, studying and investigating social and environmental conditions are required.Moreover, in these regions, monitoring and inspecting is essential and there is the need of police and guardians in addition to use of proper devices and equipment such as CCTV, alarms and security cameras.
The correlation between investigated variables was applied to find proper crime hotspots.However, due to the study scale (Tehran Mega City), the reasons and other factors were out of study in our research.In detail, considering more data such as social, infrastructural and environmental factors to explore the reasons and motivations of different crime types can be considered as a future work.

Compliance with Ethical Standards
Funding: Any institutions did not fund this study.
Ethical approval: This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent: Not applicable.

Figure 3 .
Figure 3. Applied methodological framework After producing a clustering spatial pattern, we use all variables to show the zoning patterns.

Figure 6 .
Figure 6.The spatial relationships of pickpocketing and commercial land use density Figure 7. Spatial relationships of pickpocketing and bank branch density

Figure 8 .
Figure 8. Spatial relationships of crime and subway stations density

Figure 9 .
Figure 9. Dissolved areas based on multivariate relationships

Table 1 .
Results of spatial autocorrelation test