Development of Four-Column Data Storage Model for Data-Manipulation of Greenhouse Gases and Soil Properties

Studies on greenhouse gas emissions and climate change require huge amounts of data. Generally, these data can be inducted directly from measuring devices or from different systems like geographical information systems and other specific systems, which are used for experimental data storage and management. Database technology has come a long way over the years, however even with these advances in technology and the availability of supercomputers processing large amounts of data from different data storage methods is still a challenge. In the present study, one physical data table was generated for data-manipulation and data storage using a web-developed platform. Data storage with four-column usage assures the possibility for continuous refinement during the research processes. It is because of the determination of new variables after the entity identification process.


Introduction
Apart from the water vapour, carbon dioxide (CO 2 ), methane (CH 4 ) and nitrous oxide (N 2 O) are important greenhouse gases (GHG) contributing 60, 20 and 6% towards global warming, respectively (IPCC, 2007).The global temperatures have increased 0.8°C over the past century and are predicted to increase another 1.1-6.4degrees over the next century (Peters et al., 2013).The concentrations of CO 2 , CH 4 and N 2 O in the atmosphere has increased from pre-industrial levels of 280 ppm, 715 and 270 ppb to about 379 ppm, 1, 774 and319 ppb, respectively, in 2005 (IPCC, 2007).Although the concentration of CH 4 and N 2 O is much lesser than CO 2 , both gases have 25 and 298 times higher global warming potential than CO 2 , respectively (IPCC, 2007).Ravishankara et al.(2009) reported that N 2 O can also lead to depletion of the stratospheric ozone layer.The emissions of CO 2 resulting from the soil respiration and vegetation are the principal sources from which this gas enters into the atmosphere (Raich & Schlesinger, 1992).Rice is considered as one of the most important anthropogenic sources that accounts for 10-15% of the global CH 4 emission to the atmosphere (Cheng et al., 2008).Agricultural soil is a major source of N 2 O (Hayakawa et al., 2009;Singla et al., 2013Singla et al., , 2014a;;Singla & Inubushi, 2015).Several reports are available in the published literature which studied the emissions of these GHG under various soil types, cropping, irrigation and fertilizer management (Cabrera et al., 1994;Glatzel et al., 2004;Kong et al., 2013;Singla & Inubushi, 2014a, b;Singla et al., 2014b).It will be useful and time-saving if such large amount of data could be stored in one project with one platform usage.The development of a data-manipulation model could meet such requirement and provide a useful platform for data storage.Generally, data mining projects consist of three essential tasks: data collection, data preparation, and data modelling (Pyle, 1999;Westerman, 2001).The generation of such models also ensures possibilities for storing different data in one data table with one platform usage (Imhoff et al., 2003).Significant benefits of such system models manifest when many different types and huge amounts of data are required to be compiled in one project.Users could be able to input data as usual data tables (relational) while using data models.The integrated mechanisms of such models ensure that the system stores input data into one physical data table (non-virtual).The objective of the present study was to develop input correlations between soil attributes, their GHG production potential, and geographical information system (GIS) data of sampling parameters under one platform.

Database
Oracle Application Express (Apex) was chosen for geographical and experimental data storage.Apex provided opportunity to easy access because users could share works (by hierarchical levels) in same web platform without any software installation.Data of both countries were stored in Apex data tables at the first stage.The structure of entity identification was determined with key usage (Figure 1).The data tables which were used under GHG work space are available at web-link: http://apex.oracle.com/pls/apex/f?p=4500:1000:8427548650555 Figure 1.Measurement, Country, Location, Sampling_site, Sampling_land_use, Soil,S_P_Type, S_M_Unit, P. Type, Lab or party, Gas or soil properties tables were created to store data with APEX platform.ID columns were used to create relationships between these data tables.Number values could be entry with columns of ID, value, repeat number, latitude, longitude, altitude (mBf), ferti N, ferti P, ferti K, m.temp., m.prec., texture usage.Text could be entry with columns of name.Boolean options could be chosen with columns of lab or party, and gas or soil properties

Data-Manipulation Model
The developed data storage model called Joker Tao (JT) was used for storing data of different systems like data of GIS and input experimental data in one data table.After the data entry process in Apex data tables, data of CO 2 , N 2 O and CH 4 production/consumption potentials and the soil properties under different land-use types in central Japan and eastern Hungary (Kong et al., 2013) were stored in one physical data table with four column usage (Figure 2).The data lines arose manually or due to events (prompts from the model).Different data with attributes and correlations were stored in uniform storage format.Each of the attributes was key indexed.A multitude of huge amounts of data lines generated a qualitative leap in this system model.When data lines were created, it was done automatically which could be logical.Attributes of data storage, data maintenance and properties function both as cognitive and feature.At the same time, the above explanatory properties monitor and enforce the feedback environment.The most elementary entity of a set is an object, which is determined by three factors: attribute of describing uniqueness, attribute of specified and characterized entity, and value of attribute of specified and characterized entity (Table 2).The record of this set theory shows how identification process was completed.Data of Baseline and Union main columns are same, because the event of base consist A and B sets determination.Cardinality was used for storing number data type as value.Intersection and minus procedure were used to represent query of dataset.Relationship between sets of A and B was determined how the model interprets the input data, because the possibility was guaranteed all data could be entity and attributes at the stage of data entry.

Results
Structure of data storage was determined with attributes of conceptual-functional name, functional logic and physical functionality.Tables of sampling country, sampling land use, sampling location and sampling sites were created to input attributes of sampling.Tables of measurements, measurement unit, types of measurement and types of soil properties were created to input measured data (Figure 1).

Discussion
Differently, the relational data storage models with multi-table using (Imhoff et al., 2003), only one physical data table was used for data-manipulation.In this case, there is no need to create relationships between data tables or use horizontal column expanding, which could slow down the time of the query.More data rows were used in this model than in relational multi-level database systems, but path number usage could shorten the searching line during the time in which query is completed.Data storing with four-column usage assures the possibility for continuous refinement during the researching processes, because new variables could be determined after the entities identification process.Data attributes, soil sampling parameters, values of measurements and values of GHG production/consumption potentials were stored in one physical data table which could retrieve these data using integrated mechanisms in one platform.It offers a tool and opportunity for the researchers and developers to coordinate parallel teamwork and continuous data entry processes.
We created a customized code table, which allows any data whether entity, attribute, data connection or formula, to each be stored and managed under one physical data table.JT also handles relational data models differently; it allows data lines with the same ID values to identify a single entity.Similarly, non-relational models are also unique in JT; the inputted data models are not stored in an unstructured concept, making it easier to manage a huge amount of data without creating several applications.When compared to NoSQL models the speed of the JT system lies in its use of vertical data expansion and the elimination of the sequential search approach, which slows queries.JT is self-learning as all equations can be stored within the code table while maintaining the characteristics of the balance method similar to data stored in the physical data table.
The present study showed that the development of four columns data storage model using one physical table to store data from multi tables will be useful for research and development projects in any discipline.

Table 2 .
Set theory of entity identification with the developed four columns data storage system model table with four columns usage.The process of insertion from data of multi-tables into one physical data table was determined with a Java code given below:

Table 3 .
Relationship determination (variable of Genux_proximus_idwas used to hierarchical entity identification.Value of 1 in the fourth row means that the entity which was determined with number 2 ID records are in relation to entity which were determined with value of 100 ID records) Associated attributes of this system model means that the system could initiate the search with a parallel processing mode to complete queries.Every single A and B search path is further engraved (marked) and path number increases.Path number = path number + 1.The model logs if path number is a large amount.The more frequent searches are priorities.The mentioned searching mode is completed with large amounts of data rows and cognitions One data table was used to identify attributes (columns), entities (rows) and relationships between stored data.Relationships between stored data were determined with variables of ID and genus_proximus_idusage in four columns physical data table.