The Study of Semantic Analysis on Intelligence Research under the Environment of Big Data

Faced with complex, large mass of data, how to find the information we need from these data, then to do intelligence research, it is an issue of concern in the intelligence community. This paper analyzes the significance of research and three technologies to ensure the rigor of intelligence research: visualization, data mining and semantic analysis technology, focuses on the semantic analysis technology in the application of intelligence research, exemplified by the semantic role annotation and semantic-based text orientation analysis of two methods, described the meaning of these two methods, the semantic database, the basic flow of information, their strengths and weaknesses, as well asdevelopment and raised its outlook in information research.


Introduction
With the development of cloud computing, networking, online social media and other emerging technologies, more and more data appear in our life with the explosive growth.All these massive data mark the arrival of the era of Big Data.In the era of Big Data, various digital information appears broadly.What the problem we should focus more on is how to get more, better, more accurate and more abundant data from so many data.Researches on Information Science on the era of Big Data have also had a lot of changes.The research object, research environment, research methods and tools in traditional Informatics present a qualitative leap under the environment of big data and generate many new areas.Many traditional research methods have been unsuitable for researching in a new context.Therefore the intelligence workers must find new ways to meet research needs under the environment of big data.

The Concept of Big Data
Big data suggests large data sets.The research institution, Gartner, defines Big Data: a kind of information asset with abundant and high growth rates needing new treating mode to have a stronger decision-making power, insight and capabilities of process optimization.The definition of big data in Wikipedia is: the information that the amount of data is too large to acquire, manage, handle and clear up for helping companies to decide by mainly present software tool (Big Data, April 2016).McKinsey defines big data as the data collection that we can not use traditional database software tool to collect, store, manage and analyze in certain time.
During the understanding of big data, it is not only a simple number concept, but also its great complexity, such as data-growing continuously,a wide range of data, data-exchanging frequently and complex relationship.
According to the 2011 World Forum of Big Data, big data held 4V features of huge numbers of data, various types of data, low value density, fast processing speed (Liu, H. X., & Bai, W. H. 2014).

The Development of Intelligence Research in Big Data Environment
These features of big data give intelligence research new development environment and promote further a variety of Internet technology, data mining technology and cloud computing technology more mature, resulting in increasing the forecasting components of the trends of information in the field of intelligence research and putting data analysis into an unprecedented height.Enhancing cross-border cooperation between the information and data diversity also contribute to the improvement of the usage of data integration.As a result, under the backgroun comprehen intelligenc The intelli on the an constant m personal ju activities, and more intelligenc traditional people's in different id the era of b analysis un Under the are roughl technique.On the era of big data, data generate very fast, it's difficult for us to find valuable information among our redundant data, like looking for a needle in a haystack, embodying the features of low value of the density of data.In this case, if still relying on traditional semi-automatic techniques for data processing integration, we will feel obviously powerless.And for the new types of data tracking and monitoring, although there has been some technologies, we still inevitably need human intervention.Therefore, semantic analysis on the era of big data has become an inevitable trend and necessity.

The Co
Now there are many development of semantic analysis, we will highlight semantic role labeling and Semantic-based Textual Analysis study.

Semantic Role Labeling
The substance of semantic role labeling is to have shallow semantic analysis to the sentence level, divide words into groups in a sequence according to the current knowledge of grammar and classify them in accordance with the semantic roles, which does not make the whole sentence a detailed semantic analysis, but only mark semantic roles (parameters) of given predicates (verbs, nouns, etc.) in order to have a "shallow" understanding of computer words (Yang & Zhang, 2010).Semantic role labeling is a mixture of natural language processing techniques, such as segmentation, POS tagging, syntactic analysis, so the study of semantic role labeling also provide a research platform for the study of machine learning methods and the underlying technology (Li, Sun & Li, 2011).Definite task-analyzing and convenient evaluation are its advantages.Semantic role labeling in many applications has played a significant role, it can be used to quiz systems, information retrieval, machine translation, automatic abstracting and other natural language processing.
Making semantic labeling needs a good support from semantic data bank.At present, the well-known English semantic database FrameNet, PropBank, NomBank.FrameNet is developed by U.C.Berkeley who labels British National Corpus based on the theory of semantic framework.It tries to describe each predicate (verb, noun and adjective partial) and attempts to describe the relationship between these frameworks.PropBank is that UPenn who labels shallow semantic information based on Penn TreeBank syntactic analysis.The difference between FrameNet and PropBank is that PropBank only marks verbs (non-verb), and correspondingly the verb is called the target verb.Semantic Roles in PropBank are divided into two categories: one core semantic role---ARG0-ARG5, the other modifying semantic role---ARGM.Typically ARG0 is the agent of the predicate action, ARG1 is the object of predicate action, ARG2-ARG5 have different meanings in different semantic frameworks.Different from the verbs being as predicates marked in Penn TreeBank by PropBank, NomBank just marks the nouns being as predicates, but categories of parameters and expression are the same as PropBank.It was developed in order to compensate the situation that PropBank only uses verbs as predicates have disadvantages of rough labeling.
The study of Chinese semantic labeling mainly uses three resources: Chinese Proposition Bank(CPB) ，Chinese Nombank (Xue, 2006), Chinese FrameNet (You & Liu, 2005).These are all resources to have shallow semantic labeling to Chinese.
With semantic database as a foundation, the next step is to learn from these existing semantic database and automatically carry out semantic role labeling to various resources.the basic unit of the automatic labeling of semantic role labeling system is syntactic constituents, phrases, words or dependencies.Generally thinking, each semantic role and a syntactic component is corresponding.It also means that a syntactic component corresponds to a semantic role, but the converse is not all true.Syntactic component mainly uses for the system of semantic role labeling based on the role of phrases, but now the majority of semantic role labeling systems tend to use syntactic component as the basic unit which is better used in English environment.But in other language environment, it is difficult to get the result automatically of this deep syntactic analysis.The current syntactic analysis system is in poor performance in the field of general.For this reason, someone tried to make deep semantic analysis base on shallow syntactic analysis.After all, the applicability of shallow syntactic analysis is better than the deep syntactic analysis (Li, Sun & Li, 2011).Hacioglu used dependency as the basic unit to do semantic role labeling (Hacioglu, 2004) and achieved a similar effect.Some scholars also tried to use words as units on semantic role labeling analysis, but the results are not as good as that in using the phrases and sentence compositions as labeling units.
The first step is to do syntactic analysis in semantic role labeling, and then identify the predicates on the basis of syntactic analysis.The labeling process of semantic role labeling system is divided into four steps: pruning, Because the semantics tendency is for adjectives and embodies an evaluated tendency, this method is less applied to the theme of text that mainly embodying evaluated text classification of views tendency.The research methods of text orientation analysis on the basis of semantics have two kinds: the first one is to extract adjectives in text or phrases reflecting subjective colors, then judge these extracted adjectives or phrases one by one and give them tendency a alignment, and finally add up all of the above alignments to give the overall tendency of the text.The second one is a pre-established tendency semantic model library, sometimes with a tendency dictionary, then match the testing document with reference to the semantic pattern library, and obtain the tendency of the whold text by accumulating all alignments corresponding to matching models (Yang, 2009).
HowNet is a common repository whose describing objects are the representative of Chinese and English words and that reveals the relationship between the concepts and the nature of concepts(HowNet April 2016).It is a Chinese semantic database major in semantic-based analysis of text orientation.In HowNet, the description of Chinese words is based on the concept of "original meaning", which can be considered the most basic Chinese, and can not be subdivided to the smallest semantic units.Since the meaning of the Chinese words is complex, the same word may have different meanings in different circumstances.So in HowNet, the meaning of the Chinese words could be understood as a collection of a number of items.In HowNet's semantic dictionary, each record is made up of a meaning item of a word and its description.Due to the similarity and correlation of semantics, calculation programs obtain the relative value of number.In HowNet, there is a polarity words, it refers to each word giving a metric of the semantic orientation whose size is related to the degree of association of the words and paradigm words.Paradigm word refers to a kind of word whose appraise attitude is clear, strong, and representative.The closer the relationship with commendatory paradigm words,the stronger the commendatory tendency of the words, the closer the relationship with the derogatory paradigm words, the stronger the derogatory tendency of the words.
Semantic-based text orientation analysis is applied to many fields of intelligence research.Under the environment of big data, people not only want to acquire some wanted intelligence, but also need some information about the performance of attitudes so that they can make certain decisions by the attitudes.But semantic-based text orientation analysis also has some drawbacks.For example, although available semantic emotional library can be found in English, mature spread of more widely emotion library is comparatively fewer.However in Chinese, since the tendency of word sentiment is too complex, there are many problems to establish Chinese semantic emotional library.Therefore, semantic-based text orientation analysis in Chinese should be developed more mature and the analysis of Chinese emotional tendency should be study further.

Conclusion
Under the environment of big data, intelligence research faces many opportunities and challenges.Although the large amount of data bring more information for intelligence research, the good and bad information give intelligence research a major challenge.The development of semantic analysis technology is a good direction for intelligence research.Semantic role labeling and semantic-based text orientation analysis have yet to be developed.However, with the development of words and semantic technologies, these technologies will become increasingly favored by intelligence officers.Bear in mind that we must study the universal of semantic database and some analysis tools, so that it can be applied to more areas and disciplines.

e Research un f Semantic Ana
the data are growing exponentially.The development of Internet technology not only promotes the generation of text data, but also presents us unstructured data with images, audio, video and other types.These contents contained in different types of data are different, so are the structures and storage.