Integrating Geographic Information System and Discriminant Analysis in Modelling Urban Spatial Growth : An Example from Seberang Perai Region , Penang State , Malaysia

Rapid urbanization and its negative impact have become major challenges for planners and policy makers especially in the developing nations. Urbanization is not a problem of its own but, when drastic increase of urban population occurs especially in the countries that lack resources and personnel to deal with it. In Malaysia, for example, urban population has reached > 72% and is expected to become 75% by 2020. This paper aims to bridge such knowledge gap by describing the development of spatial model of urban spatial growth. A statistical technique, discriminated analysis was used to develop the model which could differentiate urban and non-urban land. Eight spatial variables were used to develop the model. Three sets of land use data namely land use 1990, 2001 and 2009 were used in model development and validation. The model managed to accurately predicted 68.4% of built-up areas in 2001 and 68.0% of built-up areas in 2009. This model was then used to predict urban spatial growth of 2020. The result obtained from this model could potentially be used in investigating areas likely to experience urban expansion at the expense of other. This output might be useful for planners and decision makers to formulate rational policy that sustaining urbanization and control its unplanned growth.


Introduction
Urban land use is a dynamic phenomenon, changing both across space and time.Thus, a proper planning shall be undertaken to ensure that new development does not bring negative impact on the society, space and the environment (Batty, 1976;Chapin & Kaiser, 1979;Li & Yeh, 2000;Koomen & Stillwell, 2007).In order to assist in understanding and planning urban land use development, various theories and models have been developed and used (Batty, 1976;Chapin & Kaiser, 1979).Briassoulis (2000) as cited from Koomen and Stillwell (2007), for example, have extensively discussed most commonly used land-use change models and their theoretical backgrounds.Urban models have become popular tools in describing, simulating and predicting urban spatial growth.These models could be static or dynamic in nature or characterized as descriptive, predictive and prescriptive models.Early descriptive models include the concentric-zone concept (Burgess, 1925), sector concept (Hoyt, 1939) and multi-nuclei concept (Harris & Ullman, 1945) as cited from Chapin and Kaiser (1979).While these early models have been associated with isotropic plane of the synthetic cities in the United States, predictive or explanatory models attempt to go beyond descriptive means, by describing not only the existing land use pattern but also predicting the future spatial pattern of urban growth.Early models included the von Thünen model of agriculture land use, Alfred Weber's theory of industrial location (Chapin & Kaiser, 1979).Then, various models have been developed which include bit-rent curve model (Alonso, 1964), retail model, developed based on spatial interaction models (Reily, 1956;Lee, 1973), and dynamic spatial models which include cellular automata model (CA), originally developed by Ulam in 1940s.CA model has become widely used in geographical research after Tobler's (1979) geographical modeling concept was introduced.This model is claimed to be capable of generating complex systems based on series of simple rules (Batty & Xie, 1994).Furthermore, CA model can easily be implemented with GIS raster based data model.The study by Batty and Yie (1994), Clarke et al. (2007), Clarke and Gaydos (1998) were among the earliest CA model that had been integrated with Geographic Information System (GIS) to simulate urban spatial growth.
Agent-based models have started to gain popularity mainly due to its ability to mimic the behavior of agents who can influence the decision making within the systems being modeled.This approach has been used to model various systems such as residential choices, pedestrians and traffic flows (Torrens, 2000;Macgill, 2001;Torrens, 2012).These dynamic spatial models have been useful in capturing spatial and temporal processes that shape urban spatial growth.However, due to its complexity, it was less popular among planners who had limited technical and computing skills especially in the developing nations like Malaysia (Samat, 2002).Thus, its implementation has been confined to a few academic researches only.
Various studies have been undertaken that used statistical analysis to model urban systems.The integration of GIS and statistical analysis model in a loose-coupling framework has become widely use due to its simplicity and easy to use.The study by Hu and Lo (2007) and Nong and Du (2011), for example, integrated GIS and logistic regression analysis to model urban growth in Atlanta and Jiayu County of Hubei province, China respectively.Those studies managed to perform multiscale analysis which allowed understanding of the driving forces of urban expansion and formation of urban growth pattern.The study by Xie et al. (2005) also used similar approach to model rural-urban transformation in New Castle County, Delaware.Although statistical analysis has not been able to capture the dynamic spatial system of urban areas, this approach is useful in understanding drivers of land use transformation and predicting spatial pattern of urban growth.
The study proposed the development of spatial model that ensure its simplicity and applicability in the context of developing nation like Malaysia.Thus, this study integrated GIS and statistical technique namely discriminant analysis to model urban spatial growth for the Seberang Perai region of Penang State, Malaysia.Previous study conducted by Samat (2002) used GIS and Cellular Automata Model to predict urban spatial growth in this study area.Although the model managed to simulate urban spatial growth, factors influencing urban growth model were determined based on the literature review.Furthermore, that model was developed using Arc Macro Language (AML) within ArcGIS 9.3, which required programming skills.Thus, its implementation has been limited to academic research.This present study was built based upon that model which intended to be very simple and yet informative in the sense that it could be easily implemented by non-technical users.The following section described the conceptual framework of the proposed model.

Conceptual Framework of the Proposed Model
The model was developed based on geographical modeling concept proposed by Tobler (1979).The model attempted to predict urban built-up areas based on historical growth pattern of built-up areas from 1990 to 2009.The potential sites for new built-up areas were influenced by a set of drivers what are the drivers?Such that those drivers might encourage or hinder new developments to occur at time t = 0 to time t + n.The proposed study adopted earlier model developed by Samat and Rainis (2001) which attempted to predict the growth of residential areas in small town of Butterworth in Penang Island.However, instead of looking at residential development, this model was used to model expansion of built-up areas with the Seberang Perai region of Penang State, Malaysia.The formulation of the model is given by equation 1 below.

 
Where =the drivers or factors of N in influencing built-up areas that existed at site i in year t, n = the modeling period (years) N = is the number of drivers In Equation 1 above, the function f can be in various forms.This study used linear function, which is the simplest form.Thus, the above equation can be stated as: Where a = constant j  = is the coefficient for factor j.
The model attempted to predict the expansion of built-up area.This study categorized land use into built-up and non-built-up areas.Thus, the model would predict two different groups namely Group 0-non built-up areas and Group 1-built-up areas based on drivers/factors influencing urban development.The drivers influencing urban expansion included in this model were proximity to existing built-up areas, proximity to existing road networks, proximity to industrial areas, proximity to public facilities, proximity to flood prone areas, proximity to water bodies, proximity to agriculture areas and proximity to town population centers.
The model would group two categories of land uses and produced one discriminant function.The model was be developed based on land use 1990 data to predict land use 2001, then land use 2001 data was be used to predict the expansion of built-up area 2009.The predictive capability of the model was validated based on actual built-up area.Then, the model was used to predict spatial expansion of built-up areas 2020.

Method
The aim of the study is to integrate GIS and statistical analysis in predicting urban spatial growth.The model was developed based on time series data gathered from various agencies in the study area.The following section discusses the study area and data used in this study.1).The reason for selecting Seberang Perai region as the study area is due to the fact that this area has witnessed rapid expansion of urbanization resultant from industrialization policy adopted by the state government since 1970s.Seberang Perai is located within the Northern Corridor Economic Region (NCER), which is planned to be one of the growth centers and achieved a world-class economic region status by the year 2025 (Kharas et al., 2010).Therefore, various infrastructures such as North Butterworth Container Terminal, North-South Expressway and Butterworth-Kulim Expressway were developed to support industrial sector and promote economic growth within the region (JPBD, 2007;SPMC, 1998).Hence, this area stands out as a potential local centre for population growth and economic development for the northern region (JPBD, 2007).Seberang Perai has experienced significant increase of its population.In 1991 and 2000, Seberang Perai population was only 342,625 people and 736,306 respectively (DOS, 1991;2000).However, in 2010, Seberang Perai population has increased to 838,999 and it was estimated to reach 990,000 and 1.1 million people in 2015 and 2020 respectively (DOS, 2010;JPBD, 2007;Kharas et al., 2010).This increase in population will have substantial impact on resources, particularly land to accommodate the need for housing and related facilities.For example, it is projected 32,930 unit of houses is needed between 2011 and 2015 (JPBD, 2007).Although the number of houses needed for the growing population is determined, the potential sites to be selected for development are not known.Thus, planners and decision makers will not have information on the potential impact of such development on existing urban spatial growth.

The Study Area
Another reason for choosing Seberang Perai is data availability.Accessing data is an important issue in spatial modeling since data input and database creation is time consuming and costly (de Bruijn, 1991).In Malaysia, like many developing nations, few useful spatial modeling data sets exist in a digital format (Yaakup & Healey, 1994;Samat et al., 2011).Datasets used for this project include digital datasets of roads (at 1:50,000), sub-districts (at 1:75,000), slope (at 1:50,000), and land use (at 1:75,000) of 1990, 2001 and 2007 obtained from Town and Country Planning Department.Other data such as road network and public facilities were digitized from topographic maps (Department of Survey and Mapping, 1986).Soil map was obtained from Soo and Selvadurai (1969).Database was developed within ArcGIS 9.3 software.

Factors Influencing Urban Spatial Growth
Dataset was developed for the modeling purpose based on factors or drivers influencing urban spatial growth.Areas to be selected for new built-up are influenced by various factors such as physical, socio-economic and environmental.Physical factors included slope, soil conditions and geology.In this study, slope and geology were eliminated since the study area is relatively flat where only about 5% of the area has elevation of more than 50 feet (SPMC, 1998).Strong soil foundation provides suitable sites for urban development since it will reduce susceptibility to landslide or other environmental disaster (Urban Land Institute, 1990;IEM, 2000).In addition, fertile soils with high suitability for agriculture activity particularly paddy is hindered from development in order to ensure food security and preserve arable agriculture areas (Samat, 2002).
Built-up areas shall be developed in areas with favorable environmental qualities.Therefore, areas surrounding existing development, areas with high accessibility and areas in close proximity to population centers are potentially ready to be developed in the near future.These sites are favorable since they will reduce development cost, provide accessibility to employment centers, and attractive to buyers (Chapin & Kaiser, 1979;Urban Land Institute, 1978).
In this study, factors influencing urban development were derived from data obtained from various agencies in the study area.Table 1 below shows data needed to produce factors influencing urban development, which will become input in the model.The factors were produced using ArcGIS 9.3 software.Spatial analytical functions within ArcGIS 9.3 were used to produce spatial datasets to be used in the model.

Model Development
In this study, land parcel data layer was used as a based map where all other variables were combined.Both GIS and statistical analysis software have been used to develop the model.In this loose-coupling modeling approach a GIS software was used to develop the database, while the modeling procedure was undertaken within statistical analysis software.Then, the result was converted back into GIS software for validation and visualization (Yeh, 1999).Land use data were classified into two categories namely 0 which implies non built-up areas and 1 represents built-up areas.Discriminant function was chosen and the variables were entered using stepwise method.This approach will ensure only significant variables to be incorporated into the model.Table 2 shows unstandardized canonical discriminant coefficients derived after the model was executed.The model only differentiated two groups of land uses, therefore, only one discriminant function was derived.Based on the discriminant function derived about, built-up and non-built-up areas were predicted.The discriminant scores obtained from the above function was compared with group means.For group 0 (non-built-up areas) group mean 0.518 and for group 1 (built-up areas) group means was -0.316.Table 3 shows that group 0 was defined by proximity to existing built-up areas, proximity to major roads, proximity to public facilities, proximity to water bodies, and proximity to population centers.On the other hand, group 1 was defined by proximity to industrial sites, proximity to flood prone areas, and agriculture areas.Overall accuracy is 68.0% Based on the result obtained from this discriminant function was used to predict expansion of built-up areas in 2020.The result shows that most of the areas surrounding districts and regional centers (refer to Figure 1) would be transformed to built-up areas.As shown in Figure 3, built-up areas were concentrated within central and southern districts.This finding showed that the policy implement in this study area has managed direct urban built-up areas towards the southern district and protected paddy fields, mainly located in the northern district (SPMC, 1998;Kharas et al., 2010).Furthermore, the study managed to identify drivers that influenced urban expansion.
Although, the study managed to accurately predict 68%, improvement of this model could be made.This included by incorporating economic variable such as land value in to the model, which is not included since the data is unavailable.Furthermore, this model urban spatial growth during almost 10 year intervals.It would be better to incorporate shorter time period (possibly 2 years interval) in investigating urban land use development.
This will allows new development to be captured individually.

Conclusion
This study demonstrated the application of GIS and statistical analysis in predicting urban spatial growth.Based on discriminant analysis used to categorize non built-up and built-up areas, this study managed to identify factors influencing urban spatial growth and predict expansion of built-up areas with the accuracy of 68%.This model was then used to predict expansion of built-up areas for the year 2020.This period is selected since Malaysia is planned to become a developed nation by the year 2020.Although this model was developed for a small region, it could become a mechanism for monitoring and forecasting future urban spatial growth for the whole country.Its predictive capability would allow planners and decision makers to visualize the impact of certain planning policy on future distribution of urban spatial growth prior to its implementation.

Figure 1 .
Figure 1.The study area Figure 2 below shows land use data used for model development and validation.Other variables representing factors influencing urban expansion were prepared in ArcGIS 9.3 software.These factors or variables, then, were inputted into Statistical Analysis for Social Sciences version 18.

Figure 2 .
Figure 2. Distribution of built-up (grey) and non-built-up areas (green) from 1990 to 2009.

Table 1 .
Factors Influencing urban development and data sources

Table 3 .
Standardized canonical discriminant function coefficientsAgriculture area especially at the fringe of existing built-up areas has become attractive sites for development.Table4below shows reduction of agriculture land size for 5 major types of agriculture activities in the study areas.The reduction of paddy fields was more than 50% between 1999 and 2001.However, after the year 2001, paddy field was zoned under Irrigated Agriculture Development Project, which prevented the conversion of paddy fields to built-up areas(Land and District Officer, 2001).Thus, after the implementation of this project, reduction of paddy fields was quite small.

Table 2 .
Monserud and Leemans (1992)pes of agriculture activities in Seberang PeraiThe study used discriminant function given in Equation 3 to predict built-up and non-up areas of 2009.The result was then validated with actual built-up and non-built areas of 2009.Table6shows the validation result for this model based on land use 2009 data.The validation of the model shows consistent result where overall accuracy for the model is 68.0%.Furthermore, the accuracy for built-up category is 75.9% which is close the accuracy of the validation with land use 2001 data.This result is considered acceptable based onMonserud and Leemans (1992)standard.