Review of the Research on Scientific Data Management in China Libraries

Scientific data management is a hotspot and an important research topic of libraries recently. It’s helpful to grasp the latest developments and directions in this field and provide reference for our library to better carry out scientific data management services. This paper retrieves related papers collected by CNKI. (China Knowledge Resource Integrated Database) From the concept of scientific data management, management policies, service models, new talent needs, data literacy education, obstacles, etc. Analysis and summarize the research results of scientific data management in recent years. Scientific data management has attracted widespread attention from many China Information Science scholars. However, the current study still presents obvious deficiencies, slow development of scientific data management service model, lack of scientific data management policy system, introduction of new talent exists only at the theoretical level neglect of data and humanistic education. And put forward some suggestions on promoting the practice of scientific data management in China.

to the subject "Library Information and Digital Library" group browsing, after processing the obtained data, a total of 476 related research papers were obtained, as shown in Table 1.Combined with Table 1, you can find in the five years from 2008 to 2012, only a few scholars in China have explored and thought about the internal mechanisms and paths of library scientific data management services.The number of papers published is relatively small, with an average of only 4.8 articles per year, indicating that research in this subject area is still in infancy.However, since 2013, the number of research papers in this field has started to increase rapidly.The average number of papers published in the field from 2013 to 2017 was 83.4, which is about 20 times higher than the previous five years.(In 2018, only the amount of data in May was incomplete.It does not mean that the number of research papers published in scientific data management has declined.)Although it is still unable to reach a level of research in a mature research field.But this shows that China's scientific data management research has been widely recognized and concerned by the library community.Through the author's statistics on the collection of 476 documents, it was found that 378 of them were included in the core journals, which accounted for 79.4% of the total number of documents issued.It can provide a rapid progress in the scientific data management field in China and excellent quality evaluation.

High-Yield Author Analysis
The author also summarizes the scholars who have a higher number of articles in this field The specific situation is shown in Table 2.The number of people who published 5 or more articles was relatively small, only 11 people.And only 2 people published 10 or more articles between 2008 and 2018.This shows that domestic scholars are less concerned about the sustainability of the topic, and the development mode of the library's scientific data management service is not easy to break through, it is difficult to obtain innovative research results.It also requires the scholars to pay more efforts to promote the development of the library's scientific data management.

High Cited Literature Analysis
In order to make the study more comprehensive, the author summarizes the top 10 articles cited, as shown in Table 3.The 10 articles cited at the top of the list are shown in Table 3(The table shows only the first author).Among them, the author believes that the article "Library Scientific Research Data Management and Service Model Discussion" published by Li Xiaohui in 2011 is the most representative.This is a pioneering document prior to the "Scientific Data Management Heat", The author puts forward unique insights into the library's data management service model and has a certain influence and guiding significance on later scholars' research.The research themes of most of the highly cited literature focus on the construction of library data curation services, emphasizes the important role of libraries in scientific data management services.And using foreign libraries as specific reference cases, presented their own construction plan.

Scientific Data Management Services Research Analysis
Through reading the literature, the author found that the research topics of the scientific data management service focused on the concept category, service model, management policy, new talent needs, data literacy education, and obstacle factors etc.

Concept Category
At present, domestic scholars have no unified title for scientific data, and there are two expressions of "scientific data" and "research data", the definition of the concept of scientific data is also not uniform.Some domestic researchers discuss the definition of scientific data.For example, Si Li and Xing Wenming (2013) believe that scientific data refer to primitive data that reflect the nature, characteristics, and change laws of the objective world acquired through scientific and technological activities or through other means, and according to the needs of different scientific and technological activities, the system processing and sorting of various data sets.Qian Peng4(2013) explained the meaning of scientific data in terms of traditional environment and digital environment.He believes that under the traditional conditions, scientific data refers to scientific research data based on experimental observations, usually stored on paper carriers.Under the digital scientific research environment, scientific data refers to the digital representation of research objects and research processes, as well as detailed records of data at different stages of the research process.Lai Jianfei and Hong Zhengguo (2013) defined the scientific data from the true reliability of the data and believed that the scientific data refer to verified and reliable scientific research process data, semi-finished products, and achievement data.Li Xiaohui(2011) pointed out that scientific data are research data in digital form, including any data that can be stored on a computer during the research process, and non-numeric data that can be converted into digital form.

Service Mode
How to provide scientific research workers with the most professional scientific data management services is one of the most intense topics that domestic scholars are currently discussing.The specific services involved in the construction of scientific data management and sharing platforms, metadata services, research support services, and personalized services.

Scientific Data Management and Sharing Platform Construction
Chinese scholars have found through literature research and internet research on foreign universities' scientific data management and sharing platforms.The United States, the United Kingdom, and Australia are the countries that currently have relatively complete services in the construction of scientific data management and sharing platforms.Many ischool universities have built a scientific data management and sharing platform.Wei Junchao and Zhang Chunfang6(2017) researched and compared the functions of foreign scientific data management platforms and found that the platform functions mainly focused on user management, data management, and data services, and generally provided data management plans, data submission, collection, organization, storage, data sharing, and publishing services.Platform construction types are roughly divided into two types: institutionbuilding and general scientific data management software.DataStaR and PURR can be used as excellent templates for self-built scientific data management platforms in colleges and universities.General scientific data management platform software can be divided into institutional library construction software.(such as Dspace, Fedora, etc.) and specialized scientific data management software (Dataverse, Nesstar, etc.), Yin Shenqin, Zhang Jilong, and Zhang Ying et al. ( 2013) compared several platform softwares in terms of system functions, metadata standards, and data online analysis functions, and concluded that Dataverse's comprehensive performance is relatively complete.
Compared with the popularity of the United States, Britain, and Australia in the scientific data management and sharing platform, the development of scientific data management services in China is relatively inferior.At present, only three universities in Wuhan, Fudan University, and Peking University have a solid scientific data management and sharing platform.Wuhan University's scientific data sharing platform for scientific research workers provides data organization, storage, and sharing services (2013).Fudan University Social Science Data Platform provides data management and online data analysis services(2015).The Peking University Open Research Data Platform is based on Dataverse, which provides scientific data storage, publishing, sharing, and management services (2016).

Metadata Services
Metadata is an essential tool for the library to participate in scientific data management activities.Providing selection and recommendation of metadata standards and elements is a basic service for library scientific data management.Huang Ruhua and Qiu Chunyan (2014) introduced the metadata innovation practice service of DataStaR platform of Cornell University.In DataStaR, there are only "dataset title", "dataset owner", "metadata and data acquisition permission", "publication target storage warehousing" four metadata elements must be typed or selected, others can be generated automatically or by default.Wang Hui, WITT M and Dou Tianfang ( 2015) comprehensively analyze the case of Purdue University's research knowledge base.It was found that PURR chose a standard integration scheme to form a metadata description framework and fill in an online form to obtain most of the description metadata elements by the author when submitting data.Huang Xin and Deng Zhonghua (2017) discussed the metadata service content of scientific data in university libraries from the perspectives of the introduction, creation, consultation and training of metadata.

Research Support Services
Research support service as a service that runs through the entire scientific research process and is aimed at different stages of scientific research.Its own importance and necessity are increasing day by day.(2015) proposed that domestic university libraries can start with investigating the actual needs of scientific researchers and actively explore the content of scientific research support services.Xiao Xiao and Lv Junsheng (2012) believe that the library's research support services should participate in the early planning and data modeling stage of e-research to ensure the long-term preservation of data and pay attention to the exchange and cooperation with various research groups.

Personalized Services
Personalized service is an important way to distinguish it from general services, showing the uniqueness of the organization and attracting long-term users.The key is to understand the real user needs, according to the characteristics of user needs, to create high-quality services with high user satisfaction, good feedback, and continuous improvement.Wang Haibiao and Wei Junchao (2017) introduced the personalized data management plan service provided by the University of Edinburgh, provide users with DMPonline (a data management planning tool) and data management plan template.Li Wenwen and Cheng Ying (2017) proposed that the development of personalized scientific data management services for libraries should be based on the differences in information behavior characteristics of different disciplines.Focus on the selection of information sources, information retrieval, access and utilization, academic exchanges and cooperation.

Management Policies
Perfect scientific data management policies are the guarantee of scientific data sharing.Domestic scholars' researches on scientific data management policies are roughly divided into science and technology funding organizations policies and university policies.

Technology Funding Organization Policies
Si Li and Xing Wenming (2013) have found through research on science and technology organizations in the United States, Britain, and Australia that except for Australian government research funding agencies, there is no clear requirement for funding applicants to submit data management plans.The scientific research funding agencies of the United States (NSF, NIH, etc.) and the United Kingdom (six major councils) all require funding applicants to submit data management plans and specify the content and format of the plans.To ensure that scientific data generated during the research process can be effectively preserved and managed.

University Policies
Domestic researchers found that American universities first appeared in the policy of scientific data management.
Ding Pei (2014) pointed out that most universities in the United States use scientific data retention, storage, and access policies to name scientific data management policy names.Wan Yan Deng Deng (2016) found that Australian universities pay attention to the revision and improvement of scientific data management policies.Most of the university's data policies registered on the national data service website have been revised or revised in the next 3-4 years.Deng Jia and Zhan Huaqing (2014) investigated and analyzed the scientific data management policies of Monash University, and found that its administrative policy level is clearly divided into three aspects: policies, procedures, and guidelines.

Data Librarian
Data librarians should act as the main force in the scientific data management activities to lead the development of related work, they have systematic and specialized training in data management, preservation and storage, and have professional qualifications (2012).They can assume corresponding responsibilities in the creation, preservation, analysis, supervision, and reuse of data.
Huang Ruhua and Li Nan (2016) proposed that data librarians have the ability to use data validation tools and manage scientific data.Guo Sang and Lin Wei (2017) expounded the post functions of the data librarians in university libraries from the perspectives of job titles, responsibilities, qualifications for employment, and attempts of domestic data librarians.Kang Xuqin, Chen Rui, and Cheng Jin et al. ( 2016) analyzed the key issues faced by domestic scientific data management and data librarian's development and put forward specific solutions about the establishment of a system, the establishment of positions, staffing, and upgrading of capabilities, and establishing support platforms, improve the process.Sun Liling (2017), based on the Framework of Research Data Librarian's Capability, believes that has the ability to provide access to data; the advocacy and support capabilities of data management; and the ability to manage data sets is the core competence of the data librarian's professional competence.

Subject Librarian
Subject Librarians, as an important link between the scientific research team and the library, undertake the important responsibility of integrating into the scientific research team and truly understand the real needs of scientific researchers.And timely feedback to the library to lay a solid foundation for providing good scientific data management services, it plays an important role before, during and after scientific research.Mu Xiangyang and Hong Yue (2015) took the entire life cycle of scientific data as the main line and explored the work content of subject librarians step by step, setting a new orientation for the role of subject librarians.

Data Literacy Education
Domestic researchers mostly mentioned user data literacy education during their learning of foreign scientific data management experience.Because the contents of the data literacy education provided by various colleges and universities are not the same, the author divides it into universal education for general users and more professional advanced education.

Universal Education
Meng Xiangbao and Li Aiguo (2014) introduced the general education curriculum for data literacy education in some European and American universities.The teaching methods are diverse, including elective courses, lectures, seminars, and online courses.The teaching contents involved include: basic concepts of data management, data management and analysis tools for specific operations and use, data management policies and ethics.Wang Weijia, Cao Shujin, and Liao Yunyun (2017) introduced the scientific data management course of the Oregon State University Library, which is mainly divided into six topices that scientific data types; metadata; data storage, backup, and data security; ethical and legal data for research data; share and reuse; long-term preservation.

Advanced Education
Liu Guifeng and Lu Zhangping ( 2016) conducted an on-site inspection of the University of Illinois at Urbana-Champaign and found that the university has set up a master's degree in data management.The research direction focuses on data collection, representation and management, digital preservation and archiving, data standards and policies.Si Li, Xing Wenming, Zhuang Xiaozhe, et al. (2015) conducted data literacy education surveys for iSchool colleges, among which the University of Michigan School of Information, Glasgow University, and the University of Illinois Graduate School of Library and Information Science had a master's degree related to scientific data management.North Carolina State University Chapel Hill offers data management certificates and curriculum programs and has developed doctoral programs and teaching networks for scientific data management.Wu Ming, Hu Huihe, and Chen Xiujuan (2016) believe that data literacy education should be carried out from four aspects that set teaching content from different disciplines; embed scientific research process to implement teaching; diversification of teaching forms; emphasis on curriculum assessment and feedback.

Obstacles
Domestic researchers started with internal and external environments and analyzed the barriers to library scientific data management services.For example, Peng Jianbo (2014), Huang Xin, and Deng Zhonghua(2016)believe that internal factors include inadequate financial support, lack of data resources, uneven quality of librarians, and backward technical conditions etc.Si Li and Xin Juanjuan(2014), Xie Chunzhi, and Yan Jinjin(2013), considered that external barriers include insufficient support for national policies, lack of relevant laws and regulations, and failures of various fund project authorities to make explicit requests etc.In addition, Zhou Lihong, Duan Xinyu and Song Yaqian (2017) also mentioned that the lack of design and planning services and single service methods in scientific data management services in domestic university libraries.Deng Lijun (2016) pointed out that there are problems such as limited education mode, lack of training education, and incomplete coverage of educational content in data literacy education.Zhu Caiping (2014) believes that existing service practices mainly remain at the shallow level of management of scientific data, and that there are serious shortcomings in the relevance and semantics of data that can improve the application and management of data.

The Lack of Development of Scientific Data Management Service Model
The author analyzes the contents of academic papers on the topic of scientific data management in China and finds that domestic scholars' research on scientific data management service models is generally based on the experience of foreign university's data management services.There is a problem of insufficient research on the service model and operational flow of scientific data management.For example, there is no specific service that incorporates the characteristics of Chinese university libraries.Moreover, from the point of view of the established scientific data management and sharing platform, the author conducted research on it, in which the platform of Wuhan University cannot be accessed, and Fudan University and Peking University can normally visit, but these two platforms all contain data types that are single, the data volume is scarce, the update speed is slow, the content of many data spaces is empty, and the degree of encryption and sharing of data sets is low, covering a narrow range of subjects (inability to conduct interdisciplinary scientific research) and other issues.At the same time, the author learned through the literature research that the shared platform of Fudan University does not provide the use of statistical functions, unable to understand the true utilization of the platform, and is not conducive to the follow-up feedback and upgrade of scientific data management services.The author believes that the domestic scientific data management service model research still exists in the context of the data life cycle theory to carry out services, ignore the real needs of scientific research workers, ignoring the user's real needs, relying only on a theory to speculate, can not provide high-quality services that truly satisfy scientific researchers.Domestic scholars' research on foreign scientific data management service models is mostly based on access to databases.The actual utilization rate of the service in foreign universities is not clear.

Lack of Scientific Data Management Policy System
Data management policies play a decisive role in scientific data management activities.At present, the national level and the Science and Technology Funding Fund have not introduced mandatory relevant policies and regulations.Perhaps this is the fundamental reason for the slow progress of China's scientific data management policies.It does not prevent or affect domestic researchers from the bottom and consider the substance of the scientific data management policy.At present, few researchers have constructed specific scientific data management policy systems for domestic universities.Our country is in a society ruled by law, and it does not tolerate academic fraud.However, researchers in the study of foreign policy laws and regulations rarely deal with the punishment methods for providing false data.
4.1.3New Talents Should not Only Exist at the Theoretical Level Domestic researchers lacked practicality in the research of data librarians.According to the author's investigation, there is currently no library using the title of "data librarian".It can be seen that this research has so far been a purely theoretical idea without any actual action.I believe that the development of scientific data management services, data librarians are indispensable, this should not only be a good idea, should be on the agenda for the development of scientific data management services.

Neglect of Data and Humanistic Education
Domestic researchers in the field of data literacy education generally emphasize user's application of scientific data management operation technology, but they do not pay enough attention to data ethics, intellectual property rights, data citations, and data security education involved in data sharing.The author believes that the latter is the key to scientific data management.It is also the top priority for future academic development.Scientific data literacy education should not be limited to simple operations (such as browsing, uploading, saving, and downloading) that allow researchers to master data.On the level of use, more attention should be paid to standardizing its academic behavior so that scientific data can be used as a value-added function for a long time.

Reflections on Library Scientific Data Management
1.Focus on the real needs of scientific researchers.To provide scientific data management services (set up data management platforms), we must consider the true needs of researchers, starting with the survey questionnaires and monitoring their usual retrieval habits on public computers, and screen out data management platforms that facilitate the actual operation of scientific researchers.The use of statistical functions is added to the platform to provide late-stage user demand research based on utilization and to rationally improve platform services.
2. Provides characteristic data management services.Combining the characteristics of various disciplines in universities, it is targeted to provide scientific researchers with data sheets that are easy for them to fill out and facilitate operation.At the same time, it is necessary to combine the characteristics of professional disciplines, select metadata standards that already exist in the field or generally recognized in the field, and build a complete set of metadata solutions, and should continue to carry out research on the metadata resource description framework.
3. Launch linked data push services.Researchers in personalization services should work hard to provide data, documents and even images, videos, software, books, and push services that are similar to the topics of individual scientific data.Researchers should conduct in-depth research on the development of associated data to achieve scientific data value-added.
4. Build a scientific data management policy system.Researchers should consider constructing a set of university data management policy protection systems.The leadership, science and technology departments, and library should be united.And should be specified in the policy system to reward and punish the mechanism, and data citations and solution to specific problems (may consider the preparation of DOI number for each scientific data).
5. Conduct regular field research.In the future, the study of foreign scientific data management experience should not be limited to simply accessing the external network database.Under conditional conditions, field visits should be conducted to gain an in-depth understanding of the real progress and dilemma of foreign scientific data management.Provide reference for the development of domestic scientific data management.
6. Set up data librarian posts.Libraries should gather talents, employ external talents with data mining backgrounds, or use regular training to cultivate internal librarians' data mining and data analysis capabilities, promote smooth flow of scientific data management operations, and provide more convenient services.
7. Develop data literacy education.Libraries shall, in conjunction with the heads of various departments, regularly hold data literacy education lectures in various departments or libraries or attach data literacy education courses regularly to the teaching of postgraduate and higher education courses.The contents should focus on data intellectual property rights.Scientific researchers clearly recognize that data ethics is a problem that cannot be ignored.Facilitate a more standard form of scientific data management.

Table 1 .
Distribution of the Dates of Scientific Data Management Research Papers

Table 2 .
High-yield Author List

Table 3 .
High Cited Literature List Li Mei (2017)proposed that domestic research university libraries should focus on the needs of scientific research users and integrate the resources of the entire library to conduct one-stop research support services.Huang Honghua and Han Qiuming (2017) analyzed the internet survey of 40 university libraries in Australia, their research support services were found mainly in the areas of institutional library construction, IT skills training, scientific data management, scientific influence measurement, scientific research ethics training, and scientific research publication.E Lijun and Cai Lijing FanAihong and SCHMIDLE D (2012)analyzed the nature and work content of the new subject librarians of Cornell University Libraries.Get the development characteristics of the new role of the subject librarian: from service provider to academic partner; apply the latest information technology to the library; higher librarian skills and quality requirements.Hu Shaojun (2016) believes that subject librarians engaged in scientific data management work should be a kind of comprehensive high-quality talents, should have a sense of advanced data management services, integrate data and collections resource capabilities, data analysis and data mining capabilities, skilled application data platform capabilities, data management consulting and mentoring capabilities.