A Review on Personalized Academic Paper Recommendation

With the advent of the era of big data, it has become extremely easy for scientific users to have to access academic papers, which has enhanced their efficiency and capacity to search or browse papers. However, it also faces some problems such as the explosion of the literature or information overwhelming. Many researchers focus on academic paper recommendation service, hoping to help scientific users to find documents more efficiently and recommend interested or potentially interested papers which could assist academic users doing research. Through literature review, this paper make a comprehensive summary of the research on personalized academic papers recommendation, presenting the state-of-art of academic paper recommendation methodologies, pointing out its pros and cons and indicating primary evaluation metrics and popular datasets. Finnaly, we outlook the research trend of personalized academic paper recommendation as a reference for interested researchers.


Introduction
An academic paper is a theoretical or empirical article that expresses scientific research results in an academic field or in a subject area.Generally speaking, it is a systematic and specialized knowledge to discuss or study a specific problem or research result.Rational articles are academic, scientific, creative, and scientific.Generally, academic papers can be classified into three types.1) First, according to the discipline of research, academic papers can be divided into natural science papers and social science papers.
2) Second, according to the level of form and research, academic papers can be divided into theoretical research papers, applied research papers and compiling papers.Theoretical research papers focus on the basic concepts and principles of various disciplines; applied research papers focus on transforming research results and knowledge into professional technologies and production techniques, directly serving the society.
3) Third, according to the type of carrier, academic papers can be divided into journal papers, conference papers, thesis and dissertations etc.
In the era of underdeveloped networks, research users mainly obtained published academic results and information through paper journals and rubbings to understand the research content and methods of other researchers, but this method has high latency and low efficiency.With the advent of the Internet, the establishment of major database platforms and service offerings, esp online retrieval systems, have become an important way for researchers to obtain academic resources.In addition, with the popularity of open access and institutional knowledge bases, many preprint websites for papers have emerged such as arXiv.At present, mainstream database retrieval platforms or systems include Web of Science, Scopus, Ei Village etc.In China, Knowledge Infrastructure Engineering (CNKI), Wanfang is prominent.Generally, schools, institutions and other units subscribe to pay for the usage of these systems; in addition, there are many well-known Internet search engine companies provide academic search platforms, such as Google Scholar, Bing Academic and Baidu Academic and so on.Research users can use the various ways of searching, browsing and reading provided by these platforms to enjoy the convenience brought by the Internet.Due to the spread speed of the Internet, academic papers come to an era of explosion.Millions of a wid range of papers are beging published everyday.Researchers have access to these papers to get latest subject literature and state-of-art methods, gain insights, which greatly booast academic innovation.However, with the explosion of academic papers and changing demand, many researchers confronted with the problem of "Information Overloaed".The topic of academic paper recommendation comes into being.Academic papers recommendation means that academic database providers, search platforms, and search engines that use a certain recommendation mechanism recommend academic papers to academic users with efficient search and browsing, and reduce the cost of time researchers spent on searching, browsing, and reading.This helps researchers save more time to focus on their core research and conduct experiments.Academic paper recommendation is a field worthy of further exploration and research.
In this paper, we plan to make a review in the filed of personalized academic paper recommendation.The remainder of the article is structured as follows: Sect. 2 reviews the literature briefly, and Sect.3 presents a systematic summary of applied recommender methodologies within the context of academic papers recommender system.Sect. 4 lists the mainstream evaluation metrics and popular datasets.Sect. 5 proposes our insights on the topic.In the last section, we will draw a conclusion on our whole research.

Related Works
The first recorded "recommender system" (RS) was published in a paper published by Salton (Salton & McGill, 1986) in 1986, which describes a lexical vector-based algorithm that is applied to document retrieval.But the real emergence of a proper noun is that Resnick (Resnick & Varian, 1997) et al. first proposed the term "recommender system" (RS), which is considered more appropriate than "collaborative filtering" to describe the recommended technique.As a tool to solve the "information explosion" era, the recommendation system has played a huge role and potential in the fields of e-commerce, news media, social networking, film and television entertainment.Later, researchers began to apply recommendation system to the field of academic paper recommendation.
Research on the recommendation of academic papers began in the 1990s.Everyday, a large number of academic articles were sprawling on the Internet.Meanwhile, researchers waste a lot of time and effort in searching for articles that are useful and relevant to their research, but they are unable to search for satisfactory articles.In 1998, Bollacker et al. developed a search engine called CiteSeer that assists scientific users in document retrieval and promotes the dissemination and feedback of academic literature in many ways (Bollacker, Lawrence, & Giles, 1998).CiteSeer supports the use of Web search diligence and heuristics to help users locate and download related documents after users have given some target keywords, which greatly improves the user's retrieval efficiency.CiteSeer also supports simple browsing and retrieval, the system recommends the function of similar documents to users by analyzing the keywords and co-cited cases.For this reason, CiteSeer can be considered the "prototype" of the academic paper recommendation system.After that, a lof of researchers began to do research on academic paper recommendation.

Personalized Paper Recommendation Methodologies
Generally, the recommendation methods can be divided as follows: content-based recommendation, collaborative filtering, graph-based recommendation, hybrid method recommendation, recommendation based on deep learning, and see detailed description below (See Table 1).

Content-based Recommendation
Content-based recommendation is one of the most widely used methods in the recommendation system domain.The core is to infer the user's interest through the project that the user interacts with the recommendation system, and then model the user interest model based on the content-based recommendation.The recommendation system finds similar relationships between items based on the characteristic information of the items, and then recommends other items similar to the items they like.Therefore, the selection of the characteristics of the item is very important and closely related to the performance of the recommendation system.
In the academic paper recommendation system, the interaction between the user and the recommendation system is usually characterized by the author of the paper, the information of the paper (e.g.title, abstract, full text), the label and the behavior of browsing, reading, downloading etc.And researchers often need to extract a variety of vocabulary (keywords, keywords, etc.) from the title, abstract, and even the full text of the paper.With the advent of Web 2.0 and Web 3.0, some scholars began to try to extract vocabulary from academic community tags (Jack, 2012), ACM classification tree (Middleton et al., 2001) and citation context (S.Huang et al., 2004).
The vector space model (VSM) is the most popular model for storing paper representations and user models, and some researchers use graphs to store user models (Ozono et al., 2002).Philip et al. proposed a comprehensive algorithm (Philip et al., 2014) based on TF-IDF and Cosine similarity measure, which can better leveraing user's search record, and help users find the best papers.Kazemi et al.compared two well-known content-based recommendation methods-TD-IDF and word-embedding-to compare the efficiency and usability of the two methods in abstract extraction and recommendation system extraction feature generation (Kazemi & Abhari, 2017).Choochaiwattana proposed an academic paper recommendation method based on user's paper annotation (Choochaiwattana, 2010).The experimental results show that the paper recommendation method based on user papers, authors etc. which has a good accuracy and F value.K. Sugiyama et al.conducted academic paper recommendations based on the user's recent research interests (Sugiyama & Kan, 2010).Amami et al. proposed an academic paper recommendation method based on the LDA topic model (Amami et al., 2016), and carried out pretest on DBLP, the recommended effect is ideal.

Collaborative Filtering Recommendation
The main idea of collaborative filtering recommendation is to use the preferences of a group users with similar interests or shared experiences to recommend information or content that is of interest to the user, based on the response of the individual user through the "collaboration" mechanism based on the project or content (such as scoring, like, collection, etc.) and memorize these information to help to recommend content for other users.Broadly speaking, collaborative filtering is one of the most widely used recommendation methods, meanwhile collaborative filtering is less applied in the field of academic paper recommendation than content-based recommendation.
McNee et al.introduced the collaborative filtering mechanism into the recommendation of academic papers (McNee et al., 2002).They used the citation network between the papers to create a scoring matrix for literature recommendation.Pennock et al.proposed a method called personality diagnosis (PD) to recommend documents to similar users (Pennock et al., 2000).Based on the user's preference for the project, the probability that someone else may be the same type of user is calculated, and then the probability of the user's new item is calculated.This method preserves the advantages of traditional similarity calculation methods, and new data can be added anytime, anywhere.In addition, Vellino A also compared two kinds of digital library academic paper recommendation systems (Vellino, 2010) through experimental research-based recommendation and citation-based recommendation, and found that each has its own advantages and disadvantages.
However, collaborative filtering requires user scoring, but the motivation for participation is generally small, it is easy to face "cold start" problem.This scenario is due to new user or new item into the system.In order to overcome the "cold start" problem, some researchers (McNee et al., 2002;Yang et al., 2009) tried to leverage the mechanism of the invisible scoring based on the user behavior between users and items, the citation network voting mechanism.But it cannot be eliminated completely.In addition, collaborative filtering is also faced with other negative effects of invisible evaluation (Councill et al., 2008;MacRoberts & MacRoberts, 1996).

Hybrid Filtering Recommendation
When we combine several types of recommendations (usually more than two) mentioned or not mentioned, a hybrid recommendation method is formed, which can make up for the deficiencies of the single recommendation method.For example, Wang et al. proposed a method of combining collaborative filtering with a probabilistic topic model (Wang & Blei, 2011) and using this method to recommend scientific papers.Amami et al. proposed a graph-based hybrid academic paper recommendation method (Amami et al., 2017), which combines content analysis method based on probabilistic topic model and collaborative filtering.The experimental results show that the results are satisfactory.In addition, Gupta et al. [41] also proposed an academic paper recommendation method based on multi-mode distributed representation learning (Gupta & Varma, 2017).The accuracy and average accuracy of the proposed method outpeforms the current distributed representation method, which is 29.6% and 20.4% respectively.
In addition, in addition to the mixed models described above, we can also use several models to generate recommendations, and then use a certain mechanism to mix multiple recommendations results.

Graph-based Recommendation
Graph-based recommendations are a special type of recommendation.Some scholars use the online connection to build mapping network in the Academia community to show the connection between related papers (Baez et al., 2011;Küçüktunç et al., 2012) and some papers include authors and papers publishing time (Lao & Cohen, 2010), users (Z.Huang et al., 2002) and other elements.Other graph-based paper recommendations are constructed through indicators such as the degree of coupling coupling (Woodruff et al., 2000) and the co-citation strength (Zhou et al., 2008).Then, according to these indicators, we can find recommended candidate sets and rank recommendation results.
Later, for the representation of many types of features in the paper, Pan used the heterogeneous graph-based academic paper feature representation method (Linlin P., 2015) to design and implement the academic paper recommendation system based on heterogeneous graphs.The system is able to accurately and efficiently recommend relevant academic papers to researchers based on the target papers entered by the researcher.In 2018, Ma et al. systematically studied the problem of new papers in the heterogeneous bibliographic network, and proposed a new HIPRec method (Ma et al., 2018), which is based on the recommendation model of the meta-graph to solve this problem, and tested on the DBLP network dataset.The experiment shows, compared with the current optimal method, the effectiveness and efficiency of the method.

Recommendation based on Deep Learning
Deep learning is an important research direction in the field of machine learning.With the rapid improvement of computing power and storage capacity, breakthroughs have been made in image processing, natural language processing, speech recognition and online advertising in recent years.Therefore, many recommender system researchers turn their attention to deep learning, and deep learning is highly praised for its cutting-edge technology and high-quality recommendations.Compared with traditional recommendation models (e.g.content filtering recommendation, collaborative filtering recommendation, hybrid method recommendation, etc.), deep learning technology can better and deeper mining user requirements, product features, and user history interaction records.On the one hand, by learning a deep nonlinear structure, it can represent the massive data related to users and projects.On the other hand, by using automatic feature learning from multi-source heterogeneous data, those data can be mapped to the united hidden space.
Scholars have carried out extensive theoretical and empirical studies about recommendation system based on deep learning.However, the application of deep learning technology to do research on academic papers is still in its infancy, but there are some early explorers.T Ebseu et al. proposed neural citation network (NCN) architecture model (Ebesu & Fang, 2017) to sove the problem that the traditional word bag representation method is easy to lose semantic information and cannot integrate metadata.Hassan proposed an academic paper recommendation model (Hassan, 2017) based on RNN (Recurrent Neural Networks), which can better discover the semantic features of deep potential documents and improve the quality of paper recommendation.For the cold start problem in recommendation, Wang et al. used a hybrid model (Yan & Jie, 2018) combining collaborative filtering and contentbased recommendation and in the content-based recommendation part, the deep learning method is used, and the experimental results show that the method is compared with other baseline methods.The accuracy rate has increased significantly by 4%.It is foreseeable that the recommendation of academic papers based on deep learning will be one of the key research directions for researchers.

Evaluation Metrics and Datasets
In order to verify the effect of the recommended model, researchers will select certain evaluation metrics and datasets for the evaluation and verification of the model.This section will review the commonly used evaluation metrics and popular datasets of the academic paper recommendation model and guide those interested researchers.

Evaluation Metrics
The evaluation of the recommendation system effect is mainly divided into online evaluation and offline evaluation.Among them, the offline evaluation method is the first choice for the evaluation and evaluation of most recommended systems because of its convenience.The core is to calculate the effect of the recommendation system based on the real annotated datasets.In the academic paper recommendation scenario, researchers usually select a number of metrics such as the accuracy rate, recall rate, F1 and other indicators for offline testing (See Table 2).


Precision: it refers to the "similarity" between the recommended items and the user's true choice, i.e. the proportion of users interested papers in the system.


Recall: The recall rate indicates the probability that a user's actual favorite product is recommended by the system.In the paper recommendation system, it can be understood as the probability that the relevant literature that the user is actually interested in is recommended by the academic paper recommendation system.
 F1-Score: it is also known as F-Measure, which is used to balance the accuracy and recall rate.
 MRR(Mean Rank Reciprocal): it focus on the location of the first set of items in the actual recommendation list.
 NDCG(Normalized Discounted Cumulative Gain): it originates from the field of information retrieval, it is a measure of the quality of ranking, which is suitable for Top-N recommendation evaluation.
 HR(Hit Ratio): HR is a commonly used measurement of recall rate in Top-K recommendations.


Others: other metrics include non-accuracy indicators such as coverage, diversity and so on.

Datasets
In the previous section, we summarize the commonly used evaluation metrics for the academic paper recommendation model.Here we will introduce some popular datasets.At present, the datasets for academic paper recommendation evaluation mainly include CiteSeer, CiteULike, DBLP, PubMed, SPRD and other data sets.As shown below, these data sets are publicly released by academic institutions or individuals, and need some operations e.g preprocessing, converting and others.(See

Future Research Insights
Here, we propose three future research insights for those who are interested in the field of personalized academic paper recommendation: 1) Combine deep learning with traditional recommendation methods.The traditional recommendation system method presents some drawbacks to different extents, such as cold start, data sparse, etc.Furthermore, feature construction relies on manual construction, while deep learning can learn the deep characterization of the project and the implicit expression between the user and the items and integrate heterogeneous data (Ma et al., 2018).Thus, methods that combine deep learning with traditional recommendation methods have great potential and be of great significance.
2) Recommend various types of academic resources.In the process of scientific research, researchers have access to academic resources in addition to academic papers, as well as lecture resources, conference resources, academic news and videos.These academic resources should also be fully utilized by a researcher.However, most academic resource recommendation systems are single-type oriented recommendation, such as an academic paper recommendation system.Therefore, how to integrate these different types of academic resources and recommend cross-type academic resources to users is an interesting and promising direction.
3) Cross-language academic papers recommendation.As a universal language, English has led to the majority of English academic papers on the Internet.However, it is undeniable that academic papers in other languages are also full of power, which contain great academic value, but because of language barriers, these academic resources cannot The academic circle before the language could not be exposed, which led to two problems: First, academic knowledge could not be spread and shared; second, scholars could not be as complete as possible when writing literature review.To us delightment, there are some early explorers who have made some strides (Jiang, Yin, Gao, Lu, & Liu, 2018).We should also highlight the importance of building a crosslingual academic paper recommendation system.

Conclusion
This paper systematically summarizes and analyzes the current research progress of personalized academic paper recommendation from the perspectives of applied methods, evaluation metrics and common datasets, and presents deep insights in the field of personalized academic paper recommedation.Also, with the rapid development of new cutting-edge technologies such as deep learning, personalized academic papers will surely produce new novel methods or models, and the results of academic recommendations will match up to the needs of users, providing a group of intelligent paper recommendation services.Therefore, through academic personalized recommendation, we can effectively alleviate the problem of information overloading in the academic field.Thus, it can facilitate research users in the process of seeking interested or potential-interested papers, and assist in conducting academic research, which will certainly benefit and fulfill the prospect and development of scientific research.
the user has on the list of recommendations, |GT| is the set of all test sets

Table 2 .
Summary of Common Metrics

Table 3 )
CiteSeer: the public available data of CiteSeer includes public academic corpus, citation maps, author data, author collaboration networks, and is often used by researchers to evaluate academic paper recommendation systems and has become the most frequently used data set.CiteULike:CiteULike is a well-known website that provides management and discovery of academic papers.It is mainly for online research sites for academic researchers to save and share papers online.It can automatically extract citations for users and build their own resources to collect their own feelings.Interested papers, based on the user's collection of papers, the website will recommend relevant papers for users.Citeulike can save, share, organize, etc. the online academic articles and book information you read to form a personal database. DBLP: DBLP is an integrated database system for computer-based English literature on the research results in the field of computer research.The author's research results are listed by age.Includes published papers such as international journals and conferences.DBLP does not provide the function of collecting and retrieving Chinese literature.The DBLP support team did a lot of work based on DBLP data, providing a variety of search, statistics and other services, and realeasing APIs and downloadable data sets.Many researchers harvest and build new data sets such as ArtMiner and SemanticScholar based on DBLP. PubMed: PubMed is a free search engine that provides biomedical paper searches and abstracts.Its database source is MEDLINE.Its core theme is medicine, but it also includes other areas related to medicine, such as nursing or other health disciplines.PubMed makes it easy for other researchers to conduct scientific research, publishes all its data sets, and establishes a dataset baseline every year.Researchers can obtain PubMed data through batch processing, Web API, and other means.There are also datasets such as the ACM English Corpus, Bibsonomy, CORE Projects, and more.What's more, some scholars will automatically create a new data set from some academic websites, test the effect of the model, and contribute to the proposed model together with one of the contributions.

Table 3 .
Summary of Popular Datasets