A New Approach to Statistical Process Control : Identification of Outliers in Yield Maps

The tools of precision agriculture are of utmost importance in the Brazilian agribusiness, enabling increases in yields and reducing production costs. The use of harvest monitoring systems makes it possible due the possibility to identify pontual problems in an area, however, it becomes necessary to be working properly so it does not acquire incorrect information. Therefore, the purpose with this study was to propose a new approach to identify discrepant points in harvesting maps using statistical process control, as well as to define the best multiple of the standard deviation to identificate these points. The work was conducted during the soybean harvesting at São Geronimo farm in an area of 38 hectares in the municipality of Candido Mota, located in the the state of São Paulo. For gathering information, it was used a Stara crop monitoring system (model Topper Maps) set to record information during harvest in each three second. The productivity data were used to generate an individual control chart to identify points that were out of control so they could be removed. Two standard deviation multiples, that presented an average productivity closer to the average real productivity of the area, were selected. The multiples of the deviations that came closest were the 2σ and 3σ. Two multiples of standard deviation presented an average yield closer to the average real yield of the area. Individual control charts can be used to set control limits and identify possible discrepancies. The multiple of standard deviation 3σ presented information with greater reliability.


Introduction
There are technologies that monitor point-to-point productivity at harvest, as the harvest monitors allied to sensors coupled to machines that collect information in large quantities at short intervals of time.The use of high technology in the field directed to the mechanization of process, use of agricultural inputs, systems of direct sowing, biotechnology and the PA (precision agriculture), made that the commercial agriculture of today undergoes numerous modifications.There are many PA tools that have contributed to and still contribute to the development of agriculture; among them we can highlight the use of productivity maps (Santi et al., 2013).
There was an increase in the use of the PA, mainly with the use of harvest maps to contribute to the monitoring of productivity and yield of the crop.After making a map from an attribute, it is possible to identify where problems are located, and thus being able to enter with the relevant corrections for the next harvest.
Harvest monitors used with sensors coupled to machines collect productivity information in large quantities in short time intervals.However, not all the information collected demonstrates the real productivity of the field, and errors in the recording of information are common (Molin, Cremonini, Menegatti, & Gimenez, 2000).Some of these errors are eliminated by computer-generated mapping software.Still, part of the errors is identification and relatively complex characterization (Gimenez & Molin, 2004).
Errors such as smoothing, volume calibration, incorrect platform width, harvester filling time, grain retraction and crop losses were found on harvest maps (Moore, 1998;Blackmore & Marshall, 1996;Larscheid, Blackmore, & More, 1997Molin, Cremonini, Menegatti, & Gimenez, 2000).Menegatti and Molin (2003) developed a methodology to identify and characterize errors in productivity maps, where frequency distribution histograms and upper and lower statistical limits were used with the frequency distribution in the characterization of discrepant data in the data set.
Statistical Process Control (SPC) has, as one of the main objectives, the elimination of variability or part of it (Hessler, Camargo, & Dorion, 2009).This variability can be identified through graphs called Control Charts, which serve to verify when a process is stable or unstable through points inside or outside the control limits.These points when attributed to process errors or special causes, can be considered "outliers".Therefore, it is assumed that control charts can be effective on identifying possible discrepant points in harvest maps.Thus, the objective with this study was to propose a new approach to identify discrepant points in harvest maps using statistical process control through individual values control charts, to define the best multiple of the standard deviation to perform the identification of these points.

Description of the Experimental Area
The study was conducted at the São Geronimo farm in a soybean production area of the 2016 crop year with 38 hectares, in the country of Cândido Mota, state of São Paulo, located on coordinates of 22°53′24″S, 50°23′27″W.

Mechanized Harvest With Harvest Monitor
The harvesting of the area was performed with a John Deere harvester operating at an average speed of 5.0 km h -1 , model 1175 equipped with tangential track system, manufactured at 2005 and cutting deck of 19 feet (5.79 meters).The record of productivity information was obtained on a STARA model monitoring system, model Topper 4500, being configured to record points with information every three seconds.In 38 hectares were registered 11,948 points.
The harvest monitor used has a GNSS navigation system where information on latitude, longitude and altimetry positioning is received, which works as reference for the location of the points.It also has a moisture sensor of harvested grains measured by the principle of capacitance; a platform lift sensor which, upon lifting the platform, the monitor stops the information recording, and a volumetric optical type productivity sensor installed in the clean grain elevator of the harvester, performs a reading by interrupting an invisible beam of light in each time interval of 3 seconds.

Analysis of Discrepant Points (Outliers)
The productivity information recorded by the monitor was downloaded into a spreadsheet and then, inserted into the Minitab software, to create an individual value control chart that is one of the tools of the SPC (Statistical Process Control), testing the multiples of standard deviations (2σ and 3σ) for identification of outliers.
The general average of the individual values is defined, according to Montgomerry (2009), according to Equation 1: Where, μ: Average of individual values; n: Values of the observation or sampling point; n t : Total number of collected points.
The control limits of the individual value charts can be calculated using Equations 2 and 3.
Where, UCL: Upper control limit; LCL: Lower limit of control; μ: Mean of individual values; σ: Standard deviation.
All points below the LIC line and above the LSC line were removed using QGIS software (Quantum Gis Version 2.18.3) and then generated descriptive analysis using the measurements of the coefficient of variation, standard deviation and average of the set of points remaining for each multiple of standard deviation.
The data c distributio by the red, The two st the area (q descriptive From this points, ass

Results
The data c distributio represente In Figure 3.