Crop Identification by Using Seasonal Parameters Extracted from Time Series Landsat Images in a Mountainous Agricultural County of Eastern Qinghai Province , China

Time series vegetable indexes (Vis) have been evidenced a useful data to extract vegetable phenology and identify crop types. This paper conducted such a research in Qinghai Province by using Landsat TM images, via four steps, i) sampling single-crop plots and extracting crop spectrums based on pure signle-crop pixels; ii) building time-series vegetable indexes by using Landsat 8 TM images (2013-2014); iii) extracting seasonal parameters according to algorithms defined in TIMESAT program; vi) generating a decision tree for identifying crop types and validate classification accuracy via ground investigation. The results indicate that crops planted in a larger continuous range, such as spring wheat, potato and rapeseed, achieved an acceptable accuracy of above 70%, while crops planted too dispersedly (like broad bean, which is often inter-planted with other crops) or with a too smaller planting range (like barley), remained a poor recognition rates (below 50%). The value of this work lies in it displayed not only the classification accuracy of crop types in this region by using such methodology, but also the feasibility of integrating VIs calculation, seasonal parameter extracting and decision tree generation into one computer program, which is highly desired in this region.


Introduction
Wide-ranging, long-term, and spatially accurate cropland classification data is a valuable source of information for government agencies, private sector organizations, scientists, educators, and others who use land-cover information.Crop mapping on an annual basis can provide improved estimats of near real time changes in crop production and can greatly benefit strategic planning in agro-ecosystems, crop type maps are among the most important datasets in crop management and yield estimation (Boryan, Yang, & Mueller, 2011;Shao, Lunetta, Ediriwickreme, & Liames, 2010).Remote sensing data has become an efficient and reliable approach of deriving such information across large area where field-based methods are of limited utility; many researches have demonstrated successful application for various agricultural usages (Chen & Zhuang, 2012;Oetter, Cohen, Berterretche, Maiersperger, & Kennedy, 2000;Wall, Laricque, & Léger, 2008).
Phenology is the nature's calendar on periodic biological events and their controlling factors of environment, many interactions in nature depend on timing.In fact, phenology affects nearly all aspects of the environment, including the abundance, distribution, and diversity of organisms, ecosystem services, food webs, and the global cycles of water and carbon.For most plants species, phenological event like onset of green, mature and senescence often occurs at a relatively stable time node over a long time period, therefore, phenological characteristic is frequently used as an auxiliary information or even priority metric to identify plant type, such as distinguish evergreen forest from seasonal forest and trees from shrubs etc (Reed, Brown, Vanderzee, & Ohlen, 1994;White, Hoffman, Hargrove, & Nemani, 2005).Time-series Vegetation indices (VIs) derived from various remote sensing images has become one of the most promising way for such purpose, those quantitatively constructed indicators showed a high correlation between vegetation vigor/greenness and their overall spectral properties, which is generally estimated as a ratio or combination of several spectral bands via enhancing spectral sensitivity of objected vegetation characteristics (leaf chlorophyll content, leaf area, canopy cover and structure etc.) while weakening environmental factors (Simonneaux et al., 2008;Gong & Shi, 2003).
In the field of crop mapping, seasonal parameters, like the starting and ending of the season, seasonal amplitude, seasonal length etc., that derived from various time-series VIs, have being used successfully in discriminating most of stable crops, such as wheat, corn, soybean and sorghum etc. (Zhong, Hawkins, Biging, & Gong, 2011;Patel & Oza, 2015;Oteros, García-Mozo, Botey, & Galán, 2015).However, the classification accuracy of such research usually depends on the matching situation between image resolution and fields size, generally speaking, coarse spatial resolution (250-1000 m) with high temporal resolution (1-3 days) images are more suitable for large scale observation, while moderate spatial resolution (30 m) with lower temporal resolution (16 days) images are more suitable for local scale application.Because of such limitation, most of existing researches focused on plain area where the size of crop land is usually several times larger than image resolution (e.g., 15-30 hectare crop lands matches to 250 m MODIS image) (Fischer, 1994;Wardlow & Egbert, 2008;Xu, Yang, Long, & Wang, 2013).
On the other hand, from the view of local food security and global agricultural diversity, crops planted in mountainous area cannot be ignored, where crop land is usually constrained within or along river valley with a relatively smaller field size and discontinuous spatial distribution, therefore traditional agricultural survey is extremely difficult to implement and remote sensing investigation on crops phenology is highly desired.Qinghai-Tibet plateau is one of such region which is the highest crop planting region in the world (Zhang et al., 2008;Brown & Waldron, 2013).Inspired by above facts, this paper conducted such a research in a major crop production county in Qinghai Province, aming to test the feasibility of extracting seasonality parameters from 30 m resolution TM images, and building growth curves for each type of crop and obtaining decision rules for discriminating them as well.

Research Area
We selected a typical agricultural watershed in eastern Qinghai-Tibet Plateau (Figure 1a), which belongs a semi-wet climate region with an elevation range of 2102 m-4582 m (Figure 1b), average annual temperature of 6.3 centigrade, average annual precipitation of 587 millimeters, and a short growth season of less than 3 months, the stable crops commonly planted here including spring wheat, highland barley, rapeseed, potato and broad bean.Agricultural land in this region accounts for 30% of total land area, half of them are cultivated as cropland (Figure 1c).Among those cropland, about 20% located in river valleys where the elevation is below 2350 meter that can be irrigated fully, about 40% located in lower elevation hillsides (2350-2450 meter) that can be irrigated partly, the remaining 40% located in higher elevation hillsides (2450-2650 meter) that can't be irrigated and therefore belongs to totally rainfed cropland (Figure 1c).
Given the consideration of field size that most farmers planted for single crop is much smaller than 0.2 hectare, which can't match the minimum requirement of TM images interpretation.A demonstration project region of China's National Standard Farmland in Datong County (Agriculture and Animal Husbandry Bureau of DaTong County, 2013) was selected as sampling source of three main crops, spring wheat, potato and broad bean, where 14 plots was picked up as the single-crop plots for them.Another 5 plots was picked up for barley and rapeseed from northern part neighboring region for the sake of keeping consistent climate conditions for crops tested in this work, where there are no demonstration plots for them (Figure 1d).

Workflow
According to general rules of remote sensing data application and the main focus of this paper, we designed a three-step procedure (Figure 2) to fulfill this task.The key idea is applying a known set of crop seasonal parameters derived from ground-true sampling of single crops into classification of those unknown crop type plots.The scientific evidence behind this lies in the fact of although seasonal parameters of each crop is varied across years due to climate fluctuation and changes of field managements, however, the general rules of greening, browning and harvesting tended to be consistent within the same sort of crops, if there is no significant changes in climate characteristics (Sakamoto et al., 2005;Odebweller, 1984).

Image Data
Several time-series multispectral data are available for monitoring plant phenology at a continuous temporal trajectory, which can be roughly sorted into three categories according to their resolution and obtaining method.First, freely downloaded higher temporal resolution (1-3 days) but coarse spatial resolution (250-1000 m) sensors like Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Very High Resolution Radiometer (AVHRR).Second, freely downloaded moderate spatial resolution (10-30 m) but coarse temporal resolution (16 days) sensors like Landsat Thematic Mapper (TM/ETM+) and Operational Land Imager (OLI).Third, acquisition data of both high temporal resolution (1 day) and spatial resolution (6 m) sensor, like Systeme Probatoire d'Observation de la Terre (SPOT).Given the combined consideration of temporal-spatial resolution and image accessibility, Landsat TM images is a better choice for such research in this region.
As one of the most valuable datasets available for mapping and monitoring the Earth surface over 40 years, Landsat sensors provide continuous images at a 16-day time step with a resolution of 30 meter at multispectral bands, which is widely used in time series VIs analyses.However, those optical sensors carried by Landsat 4, 5, and 7 are designed without specific bands for cloud detection, which made them hard to detect these high-altitude, cold and wispy clouds in mountain area.Landsat 8 fixed this deficiency by adding a new band of Short Wave Infrared (Band 9, 1.36-1.39μm) to Thermal Infrared Sensors (TIRS), which is especially helpful for detecting cirrus clouds (Cohen & Goward, 2004;Roy et al., 2014).Given above consideration, Landsat 8 OLI images is selected as the data source for deriving time-series VIs in this paper, and the initial images (the research area is totally covered by one scene at row 132 and path 34) available from USGS website for this region during 2013-2015 is 59 scenes (Table 1).

Ground Data
Using ground-true data to assistant interpretation of image data is a common way to improve research accuracy, general experience suggests a sampling strategy using smaller-size units but a larger number of samples provide more precise estimates than a strategy using larger-size units but a smaller number of samples with an equivalent total area (Stehman, Sohl, & Loveland, 2005).This is fulfilled by consulting local agricultural managers to screen a numbers of candidate fields for each crops firstly, then followed with a two-step sampling scheme, that is: i) to sample a set of single pure crop plots that can be served as reference standards of crop identification, which is fulfilled at 2014 winter when the field shape and size, as well as crop type can be observed clearly, 19 plots are collected under the area requirement of each plot larger than 0.5 hectare, by which a minimum pixel amount of 3 can be guaranteed with a 30 meter resolution image; ii) to verify a set of unknown crop plots that identified as certain crops according to classification rules that established in first step, which is fulfilled at 2014 winter and 2015 winter by labeling 3-5 plots for each crop and verifying actual crops that planted in that plots.

Auxiliary Data
A recent land use/cover map of 2010 (Figure 1c) is used as a mask to screen possible crop land parcels in 2015 so that scaling down verification scope.This map is provided by Remote Sensing Center of EPA in Qinghai Province, which is served as a base map for "Eco-Environmental Change Evaluation of first ten years in 21 Century in Qinghai Province".Other data including digital elevation map and basic geographical map are provided by local Geographical Information Center.

Image Processing
Multitemporal satellite-image datasets are affected by different sources of noise related to the stability of sensors, changes in satellite responsivity, changes in illumination, atmospheric effects, etc., as the homogeneity of time series of satellite images is crucial when studying abrupt or gradual changes in vegetation cover, several protocols have been proposed in their pre-processing: i) geometric correction, ii) radiometric calibration, iii) atmospheric correction, iv) topographic correction, and v) relative radiometric normalization (Bodart et al., 2011).As the standard Level-1 products of Landsat 8 are delivered with already geometrically corrected, we conducted three other corrections before time-series VIs extraction, all those processing are carried out with ENVI 5.3 software.

VIs Selection and Calculation
More than 150 VIs have been published in scientific literature, but only a small subset has been systematically tested enabling substantial biophysical basis and general applicability.Among them, Couples of VIs have been evidenced suitable for usage of crop classification, including Enhanced Vegetation Index (EVI), Green Vegetation Index (GVI), Leaf Area Index (LAI), Normalized Difference Vegetation Index (NDVI), Red Green Ratio Index (RGRI), Soil Adjusted Vegetation Index (SAVI) and WorldView Improved Vegetation Index (WVVI) etc.Three of them, namely NDVI, EVI and SAVI (Bannari, Morin, Bonn, & Huete,1995;Yang, Di, Yu, & Chen, 2011), were selected to build time-series VIs based on accounts of: i) they are mostly used VIs in crop related research, ii) they have better discriminating ability on land cover phases of crop lands such as dense green crops, sparse vegetation and bare soil, iii) they are derivable from bands calculation of Landsat 8 TM images.
NDVI is generally calculated as the normalized difference between near-infrared and red bands [NDVI = (NIR -RED)/(NIR + RED)].It is constructed on the fact that chlorophyll absorbs RED whereas the mesophyll leaf structure scatters NIR, and the NDVI value thus range from -1 to 1 where negative values correspond to an absence of vegetation.Although NDVI has been demonstrated usefulness in many relative studies, however, in some situation, the relationship between NDVI and vegetation can be biased in sparsely vegetated areas (e.g.arid to semi-arid zone) and dense canopies (e.g.tropic forest).For example, in sparsely vegetated areas with a leaf area index (LAI) of < 3, the NDVI is influenced mainly by soil reflectance while for LAI > 6 (densely vegetated areas), the relationship between the NDVI and NIR becomes saturate, therefore, other vegetation indexes like SAVI might be more appropriate for those areas (Boehmer, 1996;Ke, Im, Lee, Gong, & Ryu, 2015).
Soil Adjusted Vegetation Index (SAVI) is similar to NDVI [SAVI = 1.5 × (NIR -Red)/(NIR + Red + 0.5)], but it suppresses the effects of soil pixels by using a canopy background adjustment L, which is a function of vegetation density and often requires prior knowledge of vegetation amounts.This index is best used in areas with relative sparse vegetation where soil is visible through the canopy; however, it requires local calibration because it is difficult to predict how soil effects are manifested within large pixel areas, where many different types of soils and vegetation are aggregated together.For general application, an optimal value of L = 0.5 is often adopted as the most likely account of soil background variations (Huete, 1988).
Enhanced Vegetation Index (EVI) is developed to improve the NDVI by optimizing the vegetation signal regions, it uses the blue reflectance region to correct for soil background signals and to reduce atmospheric influences, including aerosol scattering.This index [EVI = 2.5 × (NIR -Red)/(NIR + 6 × Red -7.5 × Blue + 1)] provides complementary information about the spatial and temporal variations of vegetation, while minimizing many of the contamination problems present in the NDVI, such as those associated with canopy background and residual aerosol influences.Compared to NDVI, which is chlorophyll sensitive and responds mostly to RED variations, the EVI is more NIR sensitive and responsive to canopy structural variation, including LAI, canopy type and architecture, which is also more useful in regions of higher LAI where NDVI becomes saturated easily (Huete et al., 2002).

Time-Series VIs Building
Although there are 23 scenes of Landsat 8 OLI images (at 16-day step) in a year theoretically, it is actually impossible to build a complete 23 scenes of time-series images within each year because of possible sensor dropping out and heavy cloud contamination or combination of both, therefore an interpolation of missing data is usually required to obtain a complete time-series dataset.Limited by the amount of available images and their cloud cover quality, this paper developed three assumptions to build a complete time-series dataset based on following facts: i) there are no crop growing in field during winter (end of October to early March) in research area, which makes it is acceptable to fill missing data of this period directly with any available data in other years; ii) all sampled plots are cultivated with same crops in 2013 and 2014 according to filed investigation, which make it reasonable to fill missing data of corresponding date with each other under a stable climate condition where precipitation and temperature keeps stable; iii) all missing values that occurred in continuous intervals can be interpolated with neighbor values by using method of double cumulative analysis and those in discontinuous intervals can be filled with an average value of neighboring time nodes.

Curve Fitting and Parameter Extracting
As a raw dataset of time-series VI contains not only useful signal of seasonal, but also changes that occurred gradually or abruptly, and combined noises originated from remnant geometric errors, atmospheric scatter or cloud effects, therefore, trend separation and noise removing are often required before any further usage.A number of methods have been developed for such purposes, among which, curve fitting techniques showed remarkable effects on spurious observations removing, missing values interpolation and obtaining of equal-interval observations as well, which is essential for time-series data rooted phenological research (Verbesselt, Hyndman, Newnham, & Culvenor, 2010;Chen et al., 2004).
TIMESAT (Jönsson & Eklundh, 2004) (version 3.2, http://web.nateko.lu.se/timesat/timesat.asp) is widely used software in this field, which offers not only predefined seasonal parameters like the dates of season starting and ending, VIs values of base level and peak level, dates of 20% and 80% biomass and corresponding VIs values, seasonal length and seasonal amplitude etc., which can be used as criterions for crop discrimination; but also three mostly used fitting methods that used in phenology researches, such as adaptive Savitzky-Golay filtering (SG), Asymmetric Gaussian (AG) and Double Logistic function (DL).In view of the complicity of developing a computer program for those three curve fitting methods, this paper adopted a TIMESAT-rooted approaches to get the fitted curves for each crop, which is then be served as input data for seasonal parameter extracting.The results are listed as Figure 3 and Table 2.

Decision Tree and Crop Classification
Decision tree (DT) is an effective tool used in various classifications and mapping field (Pal & Mather, 2003;Lees & Ritman, 1991), given the fact of lacking of crops spectral data and limited sampled plots, this paper adopted a three steps approach to generate a decision tree, which is then be used as classification criterions for crop identification within unknown crop type land in this region, which is: i) generate an original decision tree by input all available pixels that has known crop types via field investigation; ii) verify classification accuracy by comparing crop types assigned by decision tree with that of field investigation, and excludepixels with classification errors; iii) construct a decision tree that can be served as prediction rules for crop identification within unknown crop type pixels, which is generated from above two steps.This work is finished in software Clementine 12.0 by using C5.0 algorithm, and the final decision tree is displayed as below (Figure 4).

Verifying and Accuracy Assessment
A crop classification map (Figure 5) is generated by masking the result map of decision tree with a cropland map in 2010, with goals of improving classification accuracy as well as save processing time.Accuracy verification is followed by calculating the correct rate of identification within the sampled plots.This accuracy varies among five crops this paper focused, in overall, crops that planted in a relatively larger continuous scales, like spring wheat, potato and rapeseeds, can be identified with an acceptable accuracy, which is 76.92%, 64.29%, and 72.73% in sequentially.However, crops that planted either dispersedly (like broad bean) or too small sacles (like barley, which is planted in margin lands of less than 0.5 ha, where plots of meeting image resolution requirements are hard to find, that's why only two plots were sampled for barley in this test), are hard to be differentiated, which resulted in a pretty lower identification accuracy of them.
Beside this, the unknown area that can't be identified from this method remains about 56.3% compared to the total agricultural area (8.36 × 10 4 hectare) extracted from land use/cover map of 2010, and this area is also two times larger than statistical data of cropland (4.58 × 10 4 hectare) in that year.That is to say the agricultural land in the land use/cover map of 2010 contains many other types of agricultural land other than cropland, which need to be excluded if applying this map for the purpose of screening cropland.
Figure 5. Crop classification map generated by using decision tree within agricultural land in research area

Discussions
1) Quantity relationship between amount of sampled plots (pixels) and accuracy of crop identification need to be investigated further for moving this methodology from useful to usable In this work, a total amount of 134 pure single-crop pixels located in 38 plots was sampled as input data of known-crop type land, which including 5 crops planted over 2013 and 2015.Among which, three crops (spring wheat, potato and rapeseed) with sampling plots of above 6, achieved a higher classification accuracy (76.92%, 64.29%, and 72.73), while the other two crops (broad bean and barley) that have smaller sampling plots of less than 3, showed a poor recognition rates (below 50%).This indicated that plenty of sampling plots is vital for a higher identification rate.However, it's hard to find plenty amount of qualified plots that can meet image resolution requirement in this region because of decentralized planting patterns and undulating terrain limitations on field size, which has become the major barrier for further application of this method.
One possible solution is to collect as much as possible croplands that involved in various demonstration projects, which is usually implemented by local governments with some useful information on crop planting, such as area, crop type, field management and planting planning etc.In addition, a more detailed sampling strategies should be taken if a quantitative relationship between sampling amount and identification accuracy is desired, which should not only take more factors into account, like elevation, slope, aspect and ecotypes etc., but also a grouping experiment focusing on revealing quantitative relationship should be conducted, such as 100 pixels, 1000 pixels and even more pixels for each crops and their potential identification accuracy.
2) Accurate information of field management on crop planting is essential for missing value replacement over years even within the same plots There are always some images unqualified for establishing a time-series data set either because of contaminated images or missing of original images, so the replacement or supplement of a similar date images over years becomes one of most commonly used approaches to build a complete time-series data set.Therefore, the verification of crop types between replaced images and used images (usually the images of the same date over different years) are fundamental for such an approach, which requires detailed records of crop types and planting ranges over years.However, this approach has been evidenced difficult in this region not only because of crop rotation system that popular in this region, i.e., a two years rotation of spring wheat/potato-rapeseed or a three years rotation of spring wheat-rapeseed-potato, which results in different crops planted in the same plot over years.Besides, most of cropland that managed by farmers lacking records of fields management, which further narrowed potential sampling scale and resulted in a remarkable reduction on qualified crop land available for such a research.
There are two approaches recommended here for solving above problem, one is keeping on regular inspection on sampled plots so that the crop type and planting scale on the same plot over years can be recorded in accuracy; another one is seeking cooperation with agencies that running larger-scale crop lands and obtaining field records from them.However, both of those method will cost plenty of time and money input, which is unpractical for a short term research project.
3) Integrated knowledge of combining curve fitting algorithm into raster calculation processing is needed when a pixels-rooted curve fitting is required Five sets of fitted crop curves and seasonal parameters were obtained from known-crop type pixels, with the original intention of using them as input data of constructing a decision tree for prediction crop types within testing pixels.However, due to the limitation of TIMESAT software which can generate only one fitted curve for whole scene of pixels within one operation, while not generate fitted curves for each pixel.Therefore, the pixel-rooted curve fitting process that served as an antecedent step of extracting seasonal parameters remained unavailable via TIMESAT software, where integrated techniques of computer compiling is needed if a pixel-rooted seasonal parameters is desired.
This may be achieved by integrating algorithms involved in curve fitting and seasonal parameters extraction into raster calculation process, where comprehensive knowledge of algorithm and computer compiling languages is needed, and an automatic workflow of this methodology proposed in this paper will be achieved by this way, which will eventually make this methodology becoming a practical tool for crop management and planning.
4) Some newly developed method and comparisons among methods is needed for meeting multiple requirements on crop identification accuracy and their application It is well known that traditional pixel-based analysis of remote sensing data will result in inaccurate identification of crop types due to pixel heterogeneity, mixed pixels, spectral similarity, and crop pattern variability.Some newly developed techniques (Peña-Barragán, Ngugi, Plant, & Six, 2011;Zhong, Gong, & Biging, 2012), such as object-based image analysis that integrated with decision tree algorithms, or phenology-based crop identification methods by combining multiple vegetation indices, textural features and crop phonology etc., have showed a stable performance in large scale crop classification.Such methods are worth trying in research area because of complexing terrain conditions, where technologies of reducing data dependency on ground observation and obtaining automatic crop calendars across years is highly desired.

Conclusions
The purpose of this study is to test the feasibility of identifying crop types in a mountainous agricultural region, by using time-series VIs as well as seasonal parameters and growth profiles, which can be derived from freely downloaded TM images at 30 m resolution.The results indicate that those crops (e.g., spring wheat, potato and rapeseed) that planted in a larger continuous ranges and have plenty of qualified sampled plots available, can achieve an average accuracy of about 70%; while those either planted too dispersedly (e.g., inter-planted with other crops) or too smaller planting ranges will result in an insufficient qualified sampled plots and thus remains unrecognizable.
The value of this research lies in the schema it developed by integrating three key steps of VIs calculation and time-series building, growth curve fitting and seasonal parameter extracting, and generating a classification decision tree, into an automatically workflow that can be used as basic data of crop mapping and relative minoring activities, which is highly desired in this region.But constrained by several factors remained in this work, e.g. the limited investigation plots and shorter time-series data set obtained, computer compiling difficulty in generating fitted curves for each pixel etc., both the accuracy of crop identification and automatic level of this workflow have the potential to be improved in future.
The main conclusion of this work is list as below: i) The accuracy of classification is sensitive to the numbers of sampled plots, especially those of known crop-type plots and their spatial representative, thus the relationship between them need to be investigated further and a quantitative relationship will be helpful for any further application of such a methodology; ii) Accurate information of field management on crop planting is essential for missing value replacement over years, one feasible approach of obtaining ground-true observations is seeking cooperation from agencies that running larger scale crop lands; iii) Comprehensive knowledge of curve fitting and raster datasets processing is the key factor that affecting the seasonal parameters availability based on pixel level, resolving of this obstacle will enable the fulfillment of automatic workflow of the methodology proposed in this paper; iv) Some newly developed method are worth trying for better identification of crop types under complex environment, and comparisons among methods is needed for meeting various crop classification requirements and their application.

Figure. 1
Figure. 1 Agricultural land and sample plots for major crops in research area

Figure 2 .
Figure 2. Workflow of crop identification and classification

Figure 3 .
Figure 3. Fitted crop curves for three VIs using Double-Logistic filter algorithm derived from TIMESAT software

Figure 4 .
Figure 4. Decision tree used as prediction rules for crop identification within unknown crop type pixels

Table 1 .
Landsat 8 images available in research area during 2013-2015

Table 2 .
Seasonal parameters and variation ranges of NDVI for each crop