Stock Market Classification Model Using Sentiment Analysis on Twitter Based on Hybrid Naive Bayes Classifiers

Sentiment analysis has become one of the most popular process to predict stock market behaviour based on consumer reactions. Concurrently, the availability of data from Twitter has also attracted researchers towards this research area. Most of the models related to sentiment analysis are still suffering from inaccuracies. The low accuracy in classification has a direct effect on the reliability of stock market indicators. The study primarily focuses on the analysis of the Twitter dataset. Moreover, an improved model is proposed in this study; it is designed to enhance the classification accuracy. The first phase of this model is data collection, and the second involves the filtration and transformation, which are conducted to get only relevant data. The most crucial phase is labelling, in which polarity of data is determined and negative, positive or neutral values are assigned to people opinion. The fourth phase is the classification phase in which suitable patterns of the stock market are identified by hybridizing Naïve Bayes Classifiers (NBCs), and the final phase is the performance and evaluation. This study proposes Hybrid Naïve Bayes Classifiers (HNBCs) as a machine learning method for stock market classification. The outcome is instrumental for investors, companies, and researchers whereby it will enable them to formulate their plans according to the sentiments of people. The proposed method has produced a significant result; it has achieved accuracy equals 90.38%.


Introduction
In the recent years, one of the most popular domain using sentiment analysis on Twitter is the stock market (Yan, Zhou, Zhao, Tian, & Yang, 2016). Investors need a classification model to reduce risks in decisions making (Xu & Keelj, 2014). Being a public data source, Twitter data are readily available and accessible by everyone. Firms can directly acquire opinions of their customers from their tweets, and they can also predict their sentiments based on the acquired data. Twitter has more than 140 million active users who share almost more than 400 million tweets every day, sharing their thoughts with the public. They do also talk about trending topics or share incidents and experiences to get views and feedbacks from other users. These tweets help users and organizations to collect valuable information in different domains (Atefeh & Khreich, 2015). Firms, therefore are taking a further step by using Twitter as a data source as it is conveniently available and accessible. Data from tweets cover different kinds of events, news, as well as economics data such as stock markets, financial data associated with market indicators, e-commerce and marketing statistics (Geser, 2011;Thompson, 2008).
This study discusses the methods, models, and techniques to help investors in the stock market in making low-risk decisions using the sentiments of people. Many new models have been developed nowadays, but each model has its advantages and disadvantages. Current stock market classification models are still suffering from low classification accuracy (L. Zhang, 2013;Meesad & Li, 2014;Skuza & Romanowski, 2015;Navale, Dudhwala, Jadhav, Gabda, & Vihangam, 2016;Arvanitis & Bassiliades, 2017). Hence, this weakness in the models has a direct effect on the reliability of stock market indicators such as "series of statistical figures" and "financial reports" that explains the stock behaviour ( Bollen, Mao, & Pepe, 2011;J. Lin & Ryaboy, 2013;Ludwig et al., 2013).
Features of data, labelling techniques, and data classification techniques are instances of factors that affect the accuracy of classification model results (Jiang, Wang, Cai, & Yan, 2007;Sathyadevan, Sarath, Athira, & Anjana, 2014;Yang, Zhang, Pan, & Xiang, 2015;Alkubaisi, Kamaruddin, & Husni, 2017). This study focuses on these three factors concerning stock market classification. Features like size of keywords and the period of data collection are examples of the factors (Iacomin, 2016;Shah et al., 2016). One labelling technique, such as self-training uses prior lexical knowledge to justify the keywords polarity to be positive, negative and neutral (Zhou, Zhang, & Zimmerman, 2011;Hasbullah, Maynard, Chik, Mohd, & Noor, 2016). To produce significant results, a suitable machine learning classifier is used in classification (Padmaja & Fatima, 2013). These factors are focused in this study because they are essential in the domain of stock market and may have a direct impact on the accuracy and reliability of the stock market classification model (Jiang et al., 2007;Sathyadevan et al., 2014;Yang et al., 2015;Giatsoglou et al., 2017). This analysis will help companies to examine their customers' views and feedback on services and products they are providing. Hence, improving their services in future. This will not only help them to attract new customers in future but also assist them in retaining current ones. Investors too can get feedback on the status of different companies in which they can invest in its stock.
This study uses sentiment analysis from tweets to generate a stock market classification model. This model will benefit in improving the accuracy of decision making. Besides this, the research also covers the issues regarding the analytical techniques that affect the accuracy of classification which are reflected by consumer reactions from Twitter. Accurate classification in the domain of stock market needs specific data with particular features and polarities. Thus, the role of analytical approach through the process of data analysis prepares and creates appropriate data with specific features and polarities to achieve high accuracy with the required reliability for classification (Schumaker & Chen, 2009;Larsen, 2010).
Furthermore, several classifiers have been used by researchers to perform sentiment analysis on the stock market data source. This study performs sentiment analysis on Twitter by proposing HNBCs to produce a stock market classification model to increase the classification accuracy and reliability to support decision makers in the field of the stock market. The proposed method represents combined algorithms, including Baseline Naïve Bayes (NB), Multinomial Naïve Bayes (MNB) (Tan & Zhang, 2008), and Semi-Supervised Naïve Bayes (SSNB) (Bhattu & Somayajulu, 2012). This study hybridizes the previous three algorithms for the following five main reasons: a. Assuming features independence (Chandrasekar & Qian, 2016).
This study also proposes essential features namely: temporal and spatial features in relation to the stock market to improve and increase the accuracy of stock market classification to support sound decision making by investors and business people (Adam, Marcet, & Nicolini, 2016). Additionally, expert labelling is employed to improve the output of labelling step because every stock market has a specific feedback. Irrelevant words will be dropped, and concentration will be on the goal of the classification model to increase the exactness of input data in order to develop more related data before classification (Al-Ayyoub, Essa, & Alsmadi, 2015).
The other parts of this study are organized as follows: Section 2 discusses the related works regarding the relationship between stock market classification and sentiment analysis technique by explaining the feature selection, labelling technique, and the classification. Section 3 describes the proposed methodology. Section 4 illustrates the experiment and discussion. Section 5 discusses the recommendations and future work. The final section is the conclusion to the study.

Feature Selection
A feature is an individual measurable property of a phenomenon was observed (Bishop, 2006), Feature selection represents an essential factor to improve the classification accuracy of any stock market classification model (Iacomin, 2016;Tang, He, Baggenstoss, & Kay, 2016;Yousef, Saçar Demirci, Khalifa, & Allmer, 2016;Lee & Kim, 2017). (Zhang, 2013) and (Makrehchi, Shah, & Liao, 2013) have primarily focused on the time feature in which data was collected from Twitter in sentiment analysis. The researchers found a significant correlation between stock returns and individual's reactions. In fact, valuable data in the domain of stock market should include several features like time, location, targeted audience, brand, and kind of service, but the most important for the decision makers who are looking to invest in the stock market are time, brand and location (Janssen, van der Voort, & Wahyudi, 2017;Alkubaisi et al., 2017).
In this study, two features are selected to achieve more accurate results for classification. These two features are temporal and spatial features. The spatial feature represents the tweets based on location and temporal represents tweets based on timestamp (Song & Xia, 2016). It is important to note that data without timestamp and the location cannot support the decision makers in the field of stock market (Ruth & Hannon, 2012;Fernández-Avilés, Montero, & Orlov, 2012;Song & Xia, 2016).

Labelling Technique
An automatic labelling technique is used in the tagging step. This, however, affects the accuracy of classification results. This labelling technique automatically identifies sentiment expressed in a tweet based on the prior general lexicon not specifically relevant to the research area (Zhou et al., 2011). For example, stock market performance (Bollen, Mao, & Zeng, 2011), crime prediction (X. Wang, Gerber, & Brown, 2012) and tourism information (Shimada, Inoue, Maeda, & Endo, 2011). According to (Ahuja, Rastogi, Choudhuri, & Garg, 2015), the automatic labelling is still unable to tell the real public moods since the data is usually collected from logs directly, whereby the resulting labels need to be studied and tabulated to represent the actual users' sentiments. The logs include relevant and irrelevant data about users' Tweets. When research focuses on a specific area, it becomes necessary to focus on data pertinent to the research area. (Attigeri, MM, Pai, & Nayak, 2015) suggested that automatic labelling step could be improved for better sentiment calculations to achieve the required dataset which enhances the accuracy of the classification. Thus, the researchers used cumulative assessment of the sentiments regarding news article or tweets for sentiment analysis. Supplementing the automatic labelling problem, (F. Wang, Zhang, Xiao, Kuang, & Lai, 2015) stated that this sentiment calculation lacks specific dataset reflecting real feedback from the consumers. Real consumers' feedback will lead to improving the accuracy of the final result for the stock market classification model.
The existing stock market classification models which utilize the auto-labelling methods are based on the prior lexical knowledge, and they do not have a specific lexicon which is prepared before the sentiment analysis by the experts. Therefore, if the previous lexical knowledge does not match the domain of classification, the model will not achieve the required classification accuracy because it will not give the accurate polarity weight to the sentence (Giatsoglou et al., 2017;Jha & Mahmoud, 2017).
Anyhow, the labelling step should be compatible precisely with the domain of feedback and the aim of classification (Jha & Mahmoud, 2017). Expert analysts can accurately fit the field of input in the area of the stock market because they know the real polarity for each keyword (Liu, 2012). Besides, the integrated dataset with relevant words require features and actual polarity to generate more suitable learning algorithm which is used to extract patterns appropriate for research (Miranda & Abreu, 2015;Hasbullah et al., 2016).

Classifiers
Presently, the supervised machine learning methods such as NBCs are widely used in the domain of online data classification, concurrent with the rapid growth of internet users (Fiarni, Maharani, & Pratama, 2016). Sentiment analysis is a process that uses computational linguistics, Natural Language Prepossessing (NLP), and text mining to identify text sentiments as positive, negative or neutral. This technique has been known in the text mining field as emotional polarity analysis, opinion mining and review mining (Mostafa, 2013). In addition to calculating sentimental scores, the sentiment acquired from the text is compared to a dictionary definition in order to determine the strength of that opinion. Studies on sentiment analysis focus on text written in English, such as sentiment lexicons, whilst applying this to other languages could generate a domain adaptation problem (Cambria, Schuller, Xia, & Havasi, 2013).
The process of defining polarity of words is not natural because it depends on the domain, such as the stock market, health, and education (Padmaja & Fatima, 2013). Sentiment analysis technique affects the training set before classification, preparation, and detection of the polarity for the dataset helps to improve the classification accuracy (Abdelwahab, Bahgat, Lowrance, & Elmaghraby, 2015).
Machine learning approach for sentiment analysis such as NBC is widely used in the field of online data classification (Islam, Wu, Ahmadi, & Sid-Ahmed, 2007;Nagwani & Verma, 2014). The rapid growth of internet users and the need to analyses consumer's reaction from social networks has led to achieving a suitable pattern that has worked to support the decision of investors in the domain of stock market classification (T. H. Nguyen, Shirai, & Velcin, 2015). Analyzing consumers' reaction also helps to represent valuable data, and information and knowledge from social media such as tweets are extracted and analyzed to build useful indicators of cis.ccsenet.org Computer and Information Science Vol. 11, No. 1; available datasets (Kim, Jeong, & Ghani, 2014). The baseline NBC needs to define two variables for different classifiers; the first variable for documents and the second one for the tokens. NBC uses these variables separately, which leads to increasing the period of training dataset (Bartov, Faurel, & Mohanram, 2015 NBCs are probabilistic supervised machine learning classifiers that use Bayes rule whereby all features included in the data have assumed features independence. This means that there is no relationship between different features values (Vinodhini & Chandrasekaran, 2012). Furthermore, NBCs represent the most used supervised machine learning classifiers in the domain of text mining and data mining applications because it is characterized by simplicity and effectiveness (Islam et al., 2007;T. H. Nguyen et al., 2015). NBCs have four models, GNB, MNB, BNB, and SSNB. Hybridizing algorithms of these classifier models with different numbers of parameters and features lead to achieving the optimization (Nguyen & Armitage, 2008;Aggarwal & Zhai, 2012). This means that there is no relationship between different features values (Vinodhini & Chandrasekaran, 2012). This study adopts and combines three Bayes algorithms to formulate a better algorithm known as HNBC for the stock market classification model. The combined algorithms include NB, MNB, and SSNB.
Finally, NBCs have many advantages, one of them is fast training with fast results, and applying NBCs does not require much training. Simple training data is adequate to build a model and make a classification with low storage requirements (Netti & Radhika, 2015;Kharde & Sonawane, 2016). Assuming feature independence, features represent the attributes or characteristics of data. By using NBCs, each feature will be calculated independently, that will lead to raising the accuracy (Chandrasekar & Qian, 2016).

The Proposed Model
The proposed model is developed to break down the implementation processes of generating an enhanced classification model for stock market. The operations of model development begin with data collection phase (tweets datasets). It is essential to ensure that the collected tweets from Twitter are well pre-processed. The other phases implemented in this model are expert labelling phase, classification phase using HNBCs, as well as performance evaluation phase.  Figure 1 consists of five phases: firstly, the collection of dataset from Twitter using Twitter Application Program Interface (API) because the streaming can provide a continuous stream of the information with updates (Go, Bhayani, & Huang, 2009). Secondly, pre-processing the data set using NLP (Abdelwahab et al., 2015). NLP processing starts with the extraction process to extract the required features (Yang et al., 2015) and ends with transformations step (Kouloumpis, Wilson, & Moore, 2011). Thirdly, is the application of the expert labelling technique whereby the datasets are classified into positive, negative and neutral polarity. Fourthly, is the classification level using HNBCs. Finally, the classification model performance evaluation using F-measure, recall, precision, support, and accuracy.

Phase One: Data Collection (Twitter API)
This study focuses on tweets that reflect the consumer reactions about Al-marai Saudi Arabia (@almarai). In Twitter, there are two types of API's that are used to gather tweets, namely Twitter REST API and Twitter Streaming API (Go et al., 2009). This study utilizes an internal library using REST API, and Application Only Authentication (OAuth) required by the Twitter4j library. OAuth is used in Twitter4j library Twitter, and it supports access and provides authorized access to its API (Yamamoto, 2014). APIs streaming can provide a continuous stream of the information with updates. This study searches API and streaming API for Twitter sentiment analysis. Streaming API can access real-time tweets data using queries. This study chooses search API which is a REST API since it enables users to retrieve recently posted tweets by specific queries using HTTP methods (Makice, 2009). Moreover, it can filter results based on time, region, and language (Li, Lei, Khadiwala, & Chang, 2012).
The return of queries is a list of JSON objects containing tweets and metadata. These objects involve username, location, time and re-tweets (Aramaki, Maskawa, & Morita, 2011).

Phase Two: Data Pre-processing
The format of collected tweets needs to undergo pre-processing for data labelling and improve the classification result (W. A. Hussien et al., 2016). This study has adopted the whole steps of the pre-processing method, starting from text cleaning, removing white space, extracting abbreviation, removing stop words, and negation handling. The last process in pre-processing method works on extracting all relevant contents and the required features from tweets and dropping the irrelevant contents. This is named filtering step and all steps before it, are called transformations (Abdelwahab et al., 2015). The following explains the main two categories of the pre-processing method: a. Transformations step (Kouloumpis et al., 2011): 1. Tokenizing text by splitting it using spaces. 2. Removing stop words like or, also, and so on. 3. Eliminating URLs, usernames, hashtags, Twitter symbols, punctuations and references. 4. Reducing redundant letters such as "cooool" to "cool". b. Filtering step: This step is related to the content extraction process from the collected tweets after the transformations step (Yang et al., 2015). This study focuses on extracting two essential features, which are Spatial and Temporal feature of tweets. There are two ways to obtain spatial information about tweets: the first one is automatically collecting the accurate spatial information available on Twitter and the second is approximately inferring the location of the user from the user profile (Chandra, Khan, & Muhaya, 2011;Jiang, Cai, Zhang, & Wang, 2013). This study also uses shape-let temporal selection. This type of feature selection assume that the time series are independent and identically distributed (H. Wang, Wu, Zhang, & Zhang, 2016). Figure 2 shows the data pre-processing steps Figure 2. Data pre-processing phase  Vol. 11, No. 1; For the temporal feature, we have selected the timestamp for each tweet. We have analyzed tweet time, and tweet class are not correlated to each other. Therefore, it does not make sense that the tweet time includes the feature set, but in the advanced analysis, we have seen that preceding tweet's class and current tweet's class correlate.
To implement temporal relation in tweet analysis, we have ordered all tweets by time, and subsequently, we have added the last preceding tweets class name for the current tweet. For the spatial feature, we have selected location id for each tweet instead of location only. However, it is impossible to take the location of the tweets from Twitter for academic purpose, but we can simulate the tweet location as a fixed location id depending on the location of the company and beneficiaries from the company products and services. This perspective assumes that all individual tweet locations have a unique location id, in which the tweet mentioned correlates with the tweet sentiment class.

Phase Three: Expert Labelling Technique
This study advocates the expert labelling technique to define the polarity (positive, negative and neutral) of data after pre-processing phase. Labelling technique by experts with experiences in the domain of stock market play the primary role in enhancing the polarity result which leads to raising the classification accuracy (Kouloumpis et al., 2011). Table 1 shows an example of the expert labelling technique. Table 1. Examples of expert labelling and defining the polarity

Tweet Polarity Description of Sentiment
Demand declined on dairy products this month.

Negative (2)
This news will affect the related shares negatively.
Low fuel prices.
Positive (1) This news has a positive effect on different kinds of investment. For example, in the domain of car trading, this will lead to increasing demand of car purchasing.
The USA rap singer felt bored when he was watching the color of the chocolate tray.

Neutral (0)
This news does not have any effect on the shares related to the chocolate companies.

Phase Four: Classification
This study has applied more specifically classification method to identify a suitable pattern in the domain of stock market classification model by hybridizing NBCs. NBCs have four parameter estimators, namely GNB, MNB, BNB, and SSNB. The hybridization of NBCs for this study depends on combining the following three algorithms Baseline Naïve Bayes (NB), MNB, and SSNB. This study has chosen these three classifiers because their characteristics match the requirements. For instance, MNB handling more than one feature at the same time, and this study is already focusing on different types of features like spatial and temporal. Furthermore, SSNB is suitable for small labelled dataset, and this research deals with small dataset includes 3246 tweets. The pseudo code of HNBCs algorithm is presented as follows:-

HNBCs:
1: Create a frequency table for all the features in the train set against every single class in every single document. Extra class refers to unknown data.

Phase Five: Performance and Evaluation
The purpose of empirical evaluation focuses on how to measure the accuracy of HNBCs. Measurements that would be used in this evaluation process to compare the proposed method with baseline NB classifiers includes; Recall, Precision, F-measures, support, and accuracy.
Our equation is given below as: • Recall = Sensitivity = Total Positive Rate is a proportion of cases that were correctly identified as positive. It is defined as [ TP / (TP + FN)] = [d / (c + d)].
• Support is defined as the number of occurrences of each class.
• Accuracy is defined as the portion or part of the sum total number of classification that is correct. It is given as [(a + d) / (a + b + c + d)] or [(TP + TN) / (TP + FP + FN + TN)].

Experiment
This study uses experimental test to generate knowledge from social networks which leads to achieving a stock market classification model with high accuracy and reliability.

Collected Tweets and Pre-Processing
A collection of tweets was compiled for a period of eight months, from 18-September-2016 to 25-May-2017. The data set includes latest 3246 tweets; these tweets related to Al-marai company account on Twitter, which currently produces the best dairy products in the Saudi Arabia and Arab Middle East Countries (Euromonitor, 2016). Table 2 shows a random sample of the collected tweets after pre-processing. All tweets are in the Arabic language.

Expert-Labelling Technique
After tweets collection and pre-processing, tweets were categorized into three categories: (1: Positive, 2: Negative and 0: Neutral). Classification of Tweets depends on the impact of tweets. If it has a positive significance from the stock market, the tweet is labelled as positive, negative significance as negative, and other tweets are treated as a neutral tweet. This phase was run by two experts; both are specialists in the domain of finance and marketing.
The expert labelling depends on the relationship between tweet text and stock market concepts. Furthermore, after the expert labelling, a word-net is generated. This represents a correlated word with the field of stock market. Table 3 shows the tweets after expert labelling step.

Performance and Evaluation
This section shows the results of Applying the proposed model, benchmark, and discussion.   Table 4 shows the class precision, recall, and F1-score for HNBCs: 90% for the precision, 90% for the recall and 90% for the F1-score. Table 5 shows the results of the performance and evaluation phase after applying MNB with dataset has the following attributes:

Benchmark
• 10% of tweets have labelled by the experts • 90% of tweets have labelled by using auto-labelling (lexical based) • Spatial feature = False • Temporal feature = False MNB Accuracy: 82.53 %.  Table 5 shows the class precision, recall, and F1-score for MNB: 83% for the precision, 83% for the recall and 82% for the F1-score.

Discussion
We can observe that this study has achieved a high accuracy equals 90.38% for HNBCs with all classes (positive, negative and neutral). These results will enable the decision makers and investors in the domain of stock market exchange to make a safe decision with low risk because these results depend on facts regarding the domain of stock market. Facts like the necessary to spatial and temporal features besides the role of the stock market experts in achieving real sentiment analysis. High classification accuracy with real sentiment analysis will lead to generating accurate and reliable reports and indicators on the company's stock. Moreover, the simulation of assumptions in the domain of stock market decision making like the importance of timestamp and location features lead to add more reliability for the classification results.
For example, if the classification model comes with accuracy equal 75%, That means any decision related to this analysis has 25% of inaccuracies. From the simple previous example, we can inference that decision making in any field needs to high accuracy in the classification with simulation matching to the concepts and assumptions of that research domain.
Finally, from the results in table 4 and table 5, we can notice that machine learning methods which using sentiment analysis on Twitter like NB classifiers produce high, real and reliable accuracy by simulating the domain features and preparing the dataset using NLP methods.

Recommendations and Future Work
We recommend that, spatial and temporal features is necessary for the stock market classification model because it will lead to the increase in the importance and value of generated information besides raising the reliability of the produced reports and indicators. At the same time, expert labelling will reduce the risk of decision making in the domain of stock market exchange because it is working to increase the reality of sentiment analysis which represents a necessary process to the supervised machine learning methods.
cis.ccsenet.org Computer and Information Science Vol. 11, No. 1; For future work, we intend to add more related spatial and temporal features depending on the decision-making concepts and fundamentals of the field of the stock market exchange concurrently improving the automated-expert labelling system by generating an appropriate lexical model which has the ability to update all related keywords and vocabularies.

Conclusion
The five steps of the model presented in this study starts with data collection, filtration, determination of the polarity according to sentiments of people, classification by enhancing NBCs and ends with the performance and evaluation stage. This model works to increase classification accuracy to help decision-makers in the domain of stock market exchange and the investors who are looking for more investment in the stock market so that they can make more accurate and precise decision. This model also reduces people's apprehension on the reliability of the model. This method has mainly been founded on sentiment analysis approach, by employing expert labelling technique and features, namely, spatial and temporal. This study proposes HNBCs as a machine learning method for stock market classification. A hybrid algorithm has been adapted from NB; these hybrid algorithms incorporate two different NB algorithms based on their specific functionalities.