Adaboost-SVM Multi-Factor Stock Selection Model Based on Adaboost Enhancement

In recent years, the applications of machine learning techniques to perfect traditional financial investment models has gained a widespread attention from the academic circle and the financial industry. This paper takes CSI300 stocks as the object of the research, uses Adaboost to enhance the classification ability of original linear support vector machine, and combines all major factors to build Adaboost-SVM multi-factor stock selection model based on Adaboost enhancement. In the backtesting analysis, the stock selection strategy of original linear support vector machine was compared with the Adaboost-SVM multi-factor stock selection strategy based on Adaboost enhancement. The result shows that the Adaboost-SVM multi-factor stock selection strategy based on Adaboost enhancement possesses stronger profitability and smaller income fluctuation than the original algorithm model.


Introduction
Quantitative investment has been accepted and applied by the majority of investors due to its stable performance and rational investment.As one of the most widely used machine learning models in the financial field, support vector machines have been deeply studied by many scholars.Zhang (2016) proposed a new SVM-GARCH forecasting model and proved that the new model has better denoising and forecasting ability than the traditional ARMA-GARCH model for time series data through experiment.Huang (2017) used support vector machines to improve the traditional Fama-French three-factor model and build a new stock selection strategy.Through empirical analysis of A shares, he proved that the new strategy model has stronger profitability and forecasting capabilities.However, in practical experience, linear support vector machines often have the defects of weak classification.Dong et al. (2018) proposed Adaboost-SVM algorithm, which uses SVM as a basic classifier and Adaboost as an integrated algorithm to further and effectively overcomes curse of dimensionality and local minima problems.

SVM Classification Algorithm
Generally, the classification problem can be expressed as: considering the classification problem in the n -dimensional space, it contains n indices (i.e., n x R  ) and l sample points.The aggregation of these l sample points is: is the input indicator vector, or input, its components become features, or input indicators; is the output indicator, or output, 1, , i l .The set of these l sample points is called the training set.The question now is, for any given new pattern x and the underlying training set, whether the corresponding output y is 1 or -1.
Combining the factor characteristics of stock, it is to take each factor as a dimension and find a rule that divides the points on n R into two parts.Specifically, the above classification is divided into two types of problems.Similarly, there are classification problems that are divided into many categories.The difference is the number of output results.This paper mainly studies the classification problems of dividing stocks into two categories by using SVM.There are generally three types of classification problems, and different classifiers may be used for different types of problems.As shown in Figure 1: SVM algorithm is the process of finding the optimal hyperplane according to the training samples.In the case of two-dimensional coordinate points, the SVM algorithm is to find a straight line to separate the two types of coordinate points.However, there are countless lines, but in these lines, if they are too close to the coordinate points, the disturbance of noise will have a great impact on the classification results.So we can define that the SVM algorithm is to find the line that is farthest from the training sample, also called the optimal straight line.

Figure 2. Classification algorithm principle
Starting from the definition of classification, assuming that the training sample set   , and that the distance between the vector near the classification plane and the classification plane reaches the maximum, then G becomes the optimal classification hyperplane.
Eliminating the derivation of the optimization process, the resulting optimization problem is: x is the input, and i y is the corresponding output value; l is the number of samples, a is Lagrange coefficient, and c is regularization parameter.It is to achieve maximum margin and classification error reconciliation.

SVM Solves Nonlinear Classification Problems
The goal of SVM classification is to develop an effective approach to computing, so as to learn the "good" classification hyperplane in high-dimensional feature space.The research of SVM was originally proposed for the two types of linearly separable problems in pattern recognition.Because the data of the stock market is nonlinear, the classification ability of the hyperplane is limited, so the SVM performs nonlinear mapping on the data, through mapping: : φ xf  ,and maps the data into a higher-dimensional feature space so that the data is linearly separable and then the optimal hyperplane is constructed.Since both the optimization function and the classification function involve the inner product operation ij xx in the sample space, the inner product operation ( ) ( ) x needs to be performed in the transformed high-dimensional feature space E .According to satisfying theory, corresponding the inner product in the linear transformation space, and adopting the appropriate kernel function, it can replace the nonlinear mapping in high-dimensional space, and realize the linear classification after nonlinear transformation .According to the optimality theory, the optimization problem is:` 1 . .0, 0, 1,..., x  work, and in terms of arbitrary i ,the decision rule * 0 i ca  is set up which is given by sgn( ( )) fx ,and it's equal to the hyperplane within the characteristic space of kernel function () k x z 's implicit definition which is designed to solve optimization problem, and the definition of slack variable is relevant to set interval, then: The corresponding decision function will be : The sample data standardization of SVM algorithm adopt ranking method, which means that ranking each stock by its corresponding factor's size, then dividing index by total stock numbers and standardized factor score will be in the domain (0,1].
Later on , rank the next-term yield rate from the biggest to smallest, take the top 30% of stocks as strong stocks whilst the last as weak stocks ,and the former label as +1 whereas the latter label as -1; get the 40% of stocks in the middle out of training set because the 40% of stock yield in the middle is neither strong nor weak ,which serve as noisy data.
In order to optimize data to find stable and effective factors relatively as well as insure algorithm stability, this paper takes the past 12-month factor data as input samples.
What we can obtain from the SVM theoretical derivation is the samples was compartmentalized into two types of { 1, 1}  after getting the solutions to optimal plane, and the distance between samples and hyper plane represents the extent of how accurate the samples are classified.It can be represented by formula as: x is the new sample point, and w , b are the outcome of computing hyper plane.
On the basis of distance results, using the same way to classify stock portfolio into 10 types, then select the top type and the last type as the strong portfolio and the weak portfolio, the final step will be observing back-test results.

Adaboost Model
Adaboost is a kind of iterative algorithm, whose core idea is training different weak classifiers at the aim of a same training set, especially doing repeated trains to the data that are hard to be classified accurately, then gather all the weak classifiers to form a stronger strong-classifiers.The Adaboost algorithm itself is achieved by changing the layout of data, it adjusts each sample's weight by whether the classification of sample is correct in training set as well as by the accuracy rate of total classification in the last time, which make the hard-to-classify data trained.Hx and obtain new scores of each stock, then divide them into ten classes and observe the top and last classes.

Test Based on SVM Original Stock Picking Strategies Model
In the case of taking no account of non-linear analysis, the consequences of rolling backtest of sample data in 12 months reveal better classified effects.apparently strong portfolio outstrips weak portfolio.

Testing Of Adaboost-SVM Stock Picking Model Based On Enhanced Adaboost
From the classified outcomes of linear SVM, Adaboost portfolio with 12-layer data is more effective than monthly SVM, and the profits of long-short portfolio can be distinguished easily.

Figure 3 .
Figure 3. Net value of SVM algorithm stock picking

Figure 4 .
Figure 4. Net value of long-short strategy

Figure 5 .
Figure 5. Contrasts of long-short strategy's net value

Figure 6 .
Figure 6.Contrasts of strategy/index's net value What we can conclude from optimization function is that the training complexity index is unrelated to the method of SVM.
Send the new data set with amended weight to be trained by Bottom, classifiers, lastly aggregate the classifiers obtained from each training as the final Decision analyzer.There are Steps of Algorithm: