Data Quality Methods and Applications in Health Care System: A Systematic Literature Review

The use of data cataloging tools allows keeping different records of both qualitative and quantitative information. However, the large amount of data is not always synonymous with quality, in the medical field this argument becomes even more critical if it is considered the consequences that the lack of a systematic and rigorous process can have for patients. The analysis was conducted through a systematic review includes several general cases and practical methodologies of data quality analysis in the health context. The search for results was made using the keywords "data quality" and "health." The study considers publications made from 2014 to 2018, topic related to Business, Management and Accounting, exclusively in the case of the Tutto platform peer-review journals were chosen, English language of publication. Efficient use of information requires databases that can collect and order health information. However, this is the first step, data quality attempts to go further through the creation of qualitative or statistical control processes and indicators able to ascertain the lack of data or identify potential anomalies. The conducted analysis sets the stage for future quality implementation in the clinical pathway and patient management.


Introduction
The availability of data and their quantity is today one of the issues most addressed by different sectors.Health is no exception, in recent years there has been more significant interest in the correct management of qualitative and quantitative data, considered vital to improving service and health outcomes (Frost & Sullivan, 2012).
Before starting the review, we believe it is necessary to highlight some definitions.According to the distinction made by Davenport (1998) for "data", we mean a discrete and objective fact on events, by "information" we mean in the same way a data but transformed by the processes of adding value, contextualization, categorization, calculation, correction, and condensation.
The term data quality means the ability of data and information to respond optimally to the intended purpose; in particular, we often refer to a process characterized above all by a precise knowledge of the elements, and secondly by their management and analysis (Davenport, 1998).The main phase of data quality assessment, however, is the verification of all data management phases, the ultimate goal is to identify any deficiencies and increase their quality by reducing the costs of non-quality (Batini & Scannapieco, 2006;Wills, 2014;Biancone, Secinaro & Brescia;2018).
The reference legislation derives from the International Organization for Standardization 8000, the documents contained in it intend to provide first of all an overview of the concept of data quality and subsequently to declinate the entirety of the characteristics.The indications, for the reasons explained above, are generic and applicable to any context (ISO, 2015).
The multidisciplinary nature of the subject requires, however, to verify the specific best practices for each business.In the sanitary case, the World Health Organization has outlined the main characteristics and indicators (World Health Organisation, 2017a, 2017b, 2017c) through a dual approach that includes both the theoretical part and the practical application.
The basic framework is therefore characterized by two types of sources: the first ISO 8000 generic and applicable to different contexts and the second directly applicable to the healthcare world (Figure 1).Results that are not directly related to the theoretical or practical meaning of the data quality process in the health sector have been excluded.
Tutto is a platform able to put together several collections, and the research team has isolated the results deriving from Scopus as already present in the systematic review.

Data Extraction
The research team summarized the results by extracting the information of interest.For each article, the information concerning the author, the year, the type of study, the main characteristics, the measures used, the applications, the data collection method and the reference country (Table 2) were selected.
With the same criterion, it was possible to analyse the compatibility of the selected results with the metrics provided by the World Health Organisation (2017a) and mentioned in our introduction (Table 3).

Excluded Results
The search criteria returned 840 results.
The reading of the title and abstracts allowed to validate for the systematic review 16 sources equal to 1.90%; the rejection rate was 98.10% for a total of 824 items.
In the reading of the results, a precise classification by subject was assigned (Table 1, Figure 2).
During the analysis, a strong noisiness of the results was noted.In particular, 36.41% of the cases analyses patient satisfaction, the quality of the treatments and services provided by the health system, as well as reflections on the quality of the healthcare brand.Moreover, 19.17% of results are involved in analysing new IT systems to increase the quality of healthcare and the use of algorithms for managing medical records and medical needs; in 8.50% of cases, however, the articles that were presented by the database were not relevant to any of the keywords chosen for the systematic review.
Afterward, 7.89% of the excluded results refer to the quality of jobs and health leadership, 7.65% is inherent to scientific studies on the environment and on the impact concerning the quality of life of magnetic waves, of PM 2.5 and 10.
Furthermore, 6.07% examine the quality of food in the health context, 5.70% deal with analysing the quality of public services related to health issues.2.67% take into consideration tourism, and health mobility and its correlation with the quality of care, 2.43% of the excluded analyse Big Data, 1.33% take into consideration the quality of the care service offered by the insurance companies, for example within the "Long Term Care" policies.
Finally, 1.21% analyse the impact of privacy on the quality of medical care, 0.61% refers to the supply chain process within the quality of the drug procurement process, and 0.36% of there are also cases that examine the quality of waste recovery.
ijbm.ccsen   63 (7,6   50 (6,0   47 (5,7   22 (2,6   20 (2,4   11 (1,3   10 (   According to Verma (2014), a two-time theoretical health framework is needed.Firstly, on the selection and selection of a particular area of interest on which to focus, on the definition of criteria and standards of measurement, on the management and programming of a method.In a second step, the collection and analysis of the data, the implementation of the changes and their subsequent evaluation of the effects are foreseen.The application and experience are referred to the department of anaesthesia at the Derby City General Hospital.
The qualitative approach provided by Wills M. ( 2014) is based on three main elements: the use of "small" data, predictive models and real-time analysis.The first element considers that small data are often more easily translated into concrete actions for the benefit of patients.Through predictive models, it is possible to focus on future forecasts and assign a level of risk to each patient in order to provide prompt assistance.Finally, clinical information in real time can benefit concerning data availability speed and even more evidence regarding past clinical history.
The case analysed by Zozus et al. ( 2014) intends to provide a theoretical framework on the meanings of data quality.The main definitions, as reported, concern completeness, accuracy, and consistency.For completeness is meant the presence of a particular data, this concept can coincide with the presence of particular information requested in a database, calculated for example through a percentage.Accuracy is a property of the data and corresponds to the values assigned arbitrarily by those who manage the data (ISO, 2015).Consistency is understood as an element of uniformity for possible comparable scenarios, for example, coherence between health districts or clinical records.
Qualitative cases only partially respond to the data quality measurements provided by the World Health Organization.For completeness, it is useful to refer to Ledikwe et al. (2014) which shows how the characteristic is directly related to the presence of health guidelines outlined.For Skyttberg et al. (2016) completeness is instead associated with the fullness of the compilation fields of electronic health records (EHR's).Finally, for Zozus, et al. ( 2014) can be associated with four main elements: 1. completeness of the synthesis elements (for example the columns of a database), 2. the values inserted inside, 3. completeness of the values row and 4. the completeness understood as an opportunity to extract data by column and row by starting procedures to create operating percentages useful for healthcare.
Data timeliness is defined by Ledikwe et al. (2014) as a level of timeliness in the loading of data by health professionals, according to Verma, (2014) this meaning depends on the type of data and use.
Internal and external data coherence is considered in the case of Ledikwe et al. (2014) as the data management system in various health districts in Botswana (and therefore internally) is compared but also using an external "Monitoring and Evaluation" (M & E) system at national level.Also in the study by Skyttberg et al. (2016), it is possible to find elements of coherence, in this case, the consistency is compared between the databases and the paper documentation available to health personnel, the external consistency comes from the examination of 9 emergency departments in Sweden.

Quantitative Approach
Among the appropriately quantitative approaches, it is useful to analyse the treatment of Mitsunaga et al. (2015), which provides a focus on the accuracy of data in household registries by health workers in Rwanda.The study envisaged the possibility of creating a data quality assessment system based on indicators related to demography and women's health.For each category a register was created, in this sense, the assessment of the quality or non-quality was done through the direct observation of researchers and interviews.Each category then had an accuracy rating ranging from 0 to 5, and precise estimates were given based on the 95% confidence interval.The qualifying element for the research team, in this case, is the division of the data quality process by patient type and the statistical approach used.
The case treated by Watson et al. (2017) was born within a health program aimed at children suffering from pneumonia.The first data quality action was the creation of a centralized database that eliminates the logistics problem of transporting information between different workplaces.In particular, the sources included concern the results of diagnostic tests such as radiographs and digital listening files (for example thorax).
The data and information processed to correspond to qualitative parameters for the diagnosis of pneumonia.The analysis, in this sense, has had several objectives: 1. monitoring of input and data quality in databases almost in real time; 2. monitoring of clinical trial operations in real time (e.g., through sample volume and time from sample collection to laboratory reception); 3. the insertion of data at several collection points in each site (e.g., Laboratory, clinic, first aid); 4. rapid implementation of any changes to the information system required by the clinical staff.
The validation of the system has provided for a training activity aimed at homogenizing data insertion in the database and monthly monitoring systems on the lack of data.
The quantitative cases also examined partially meet the criteria outlined by the World Health Organization.
Completeness and timeliness is analysed by Mitsunaga et al. (2015) as an essential element during the home visits of health personnel and the compilation of digital folders, in the case of Watson et al. (2017) refers exclusively to the completeness referred to demographic data, clinical vaccinations, environmental elements, and risk factors.
Both the aforementioned cases also respond to the principle of internal coherence, in the case of Mitsunaga et al. (2015) this refers to the consistency between the information in the registers and the interviews made by the medical staff, finally, for Watson et al. (2017) represents the percentage of data present and corrected in the 9000 clinical cases analyzed.

Quantitative & Qualitative Approach
The approach described by Firouraghi et al. (2018) provides for the analysis of data quality within the haemodialysis database.The database analysed contained 2367 patients with 72 different types of variables.The activity carried out envisaged the removal of redundant data, the detection, and removal of exceptional values, as well as the management measures for missing values.The statistical methods used envisaged the use of variance and standard deviation measurements, quartiles and frequency histograms to detect anomalous values.The methodology includes the analysis of data with the verification of missing information or anomalies as completeness and timeliness elements required by the World Health Organisation (2017a).The variables taken into consideration show a lack of data between 0 and 19.73%.The primary objective in this case too is represented by the implementation of strategies able to limit errors in the management of patient data; by identifying the increase in the quality of data entry, it is indeed possible to reduce the risk of an adverse event, thus increasing efficiency.The table summarizes the main characteristics of the studies regarding the author, the year of publication, the type of study, the main features included, the main measures, the applications, the method of data collection and the country.

Findings
There are many applications related to the data quality process, in general, in all the research included in the systematic review there is the affirmation that the process of analysis and evaluation of data quality represents an essential point.This aspect is complementary to the clinical care that is delivered to patients.The identification of errors and incorrect practices allows to improve the quality of the databases used and, indirectly, avoids errors or improper practices (Firouraghi et al., 2018).
One of the elements common to all research is the need to set up data platforms; cataloging in databases is, therefore, the first preparatory step at the beginning of the data quality process.
In the cases reported by Ledikwe et al. (2014), Mitsunaga et al. (2015) and Skyttberg et al. (2016) the construction of a digitized space able to host qualitative and quantitative data could be done through the assessment of working methods through the use of interviews and observations aimed at the activity of health In other cases, this could be done directly on already operational databases to which it was possible to apply specific indicators that take into consideration the accuracy and quality of the content (Firouraghi et al., 2018).
The keys to reading and commenting are made, in most cases, through the selection of variables related to the clinical data that are available to researchers.Some examples refer to weight, age, body mass, the level of systolic pressure or values of blood test results which instead provides a model of data quality divided by type of patient, for example, children under the age of 5 or women between 15-49 years.
From compatibility with the requirements of the World Health Organisation (2017a), it is outlined how both qualitative and quantitative cases pay much attention to the description of completeness and, in part, also to the timeliness of data.External coherence appears to be present exclusively in qualitative studies (Ledikwe et al., 2014;Skyttberg et al., 2016).Finally, the presence of external comparable was not found in any study.

Conclusion
The systematic review focuses on the issue of data quality within the health sector.The use of performance data cataloging systems places the need to reduce the risks and errors that are of particular importance in the healthcare sector.
Nowadays the tools that technology reserves allow us a greater simplification and greater control of the quality.
The analysis conducted focuses on some relevant elements: the construction or use of databases functional to the analysis of data quality; the presence of both qualitative methods (through interviews and cataloging of data), and quantitative (through the use of functional statistics to verify the quality of data);

-
the presence in all the results of processes that include, in addition to the time of cataloging the data, also the verification of indicators and the consequent modification of how health professionals carry out data retention operations; the completeness of the data considered important and analysed in a transversal way in the results; the absence of cases of comparison but rather the exclusive analysis of their cases as suggested by the World Health Organisation (2017a).
Regarding the first point, as it was possible to analyse, the practical cases reported consider the construction of the virtual working space through databases, as the first functional step for the start of the data quality process and the collection of health data (Bai et al., 2018;Mitsunaga et al., 2015;Firouraghi et al., 2018;Watson et al., 2017;Wills, 2014).
At this point in the discussion, the approach to data quality appears twofold: on the one hand results of process and method, among them interviews with health personnel as in the case of Ledikwe et al. (2014) and on the other statistical analysis on medical proxies as in the case of Watson et al. (2017).
What appears clear and standard is that the start of data evaluation activities presupposes the meticulous selection of the analysis area.Subsequently it is necessary to carry out the cataloguing of the data, almost as if it were a photograph of the operational methods of data management; the calculation of indicators such as completeness, internal and external consistency, and finally, the adoption of corrective actions coordinated, for example, by highly standardized modules and preventive training courses aimed at health professionals (Ledikwe et al., 2014) Among the most widely used measures, completeness emerges, both in qualitative and quantitative results only; according to what emerged is a standard of analysis from which to start the analysis.Even in the light of what has been learned, we can state that the data quality process should not stop at a single indicator.In this sense, the completeness element represents a quantitative value referring to the fullness of data within a database; the overall judgment on data quality should, however, be extended, combining observations, internal points of view and external comparisons. Figure Figure

Figure
Figure

Table 2 .
Studies characteristics for data quality health in the world

Table 3 .
Selection of qualitative and quantitative results for consistency with the World Health Organization Framework