A Simplified Variable Analysis of Credit Ratings for Small Chinese Enterprises Based on Support Vector Machine

Small enterprises are an important component of the national economy and valuable customers of commercial banks. Commercial banks use credit ratings, including financial and nonfinancial indices, to analyze small enterprises before committing to long-term collaborations, including loans. This paper uses a support vector machine algorithm to establish an imbalanced multi-classification model and compares the results to those of other methods. Commercial banks need simplified variable analysis credit ratings that use minimal information to rapidly and accurately obtain credit ratings and improve the efficiency of the process. Accordingly, we perform multiple tests of simplified rating systems using fewer variables.


Introduction
The Chinese government has always attached great importance to the development of small enterprises and constantly builds up the financial ecological environment of small enterprises to further their development. However, it has always been very difficult for small enterprises to obtain financial support. In July 2014, Premier Li Keqiang repeatedly mentioned questions related to reducing the cost of financing for enterprises, especially small enterprises, at the state council executive meeting. The limited sources of funds and the lack of continuous financial support have become a bottleneck in the sustainable development of small enterprises. Therefore, it is urgent to solve the financing problem of small enterprises by facilitating an effective financial support system to promote their development.
At the same time, commercial banks, as money suppliers, are also under significant pressure. After a few years of rapid development, Chinese commercial banks and the financial system, in general, have established a good financial foundation for financing enterprises. Under the macro-background of the liberalization of interest rates and financial innovations, which are increasingly advanced by the Internet, market competition between commercial banks has become increasingly fierce with increases in the cost of deposits and the interest rates on loans. Therefore, this has resulted in the compression of the profits from the interest balance between deposits and loans. With fierce competition in the credit market, commercial banks need to obtain customer resources to increase their market share, especially in the form of small enterprises. Small enterprises have therefore become important customers for commercial banks. The credit ratings for small enterprises are key to solving the problem of financing small enterprises. The practical problem is finding an efficient and accurate method for obtaining a credit rating for new customers. This includes how to effectively identify and analyze the performance of small enterprises and how to eliminate small enterprises with poor performance.
There are two categories of credit rating methods: quantitative and qualitative. Qualitative evaluation methods are called artificial expert analyses and are also known as classic credit analysis methods. At present, Chinese commercial banks still primarily use this method. However, a few quantitative credit rating methods have also been used. Initially, Altman (1977) used multiple discriminated analysis, and Zhihui and Meng (2005) and Zhang (2010) used logistic model analysis. Credit metrics were used by J.P. Morgan in the United States in 1997 (Morgan, 1997) This is a value-at-risk model that estimated the risk value of loans and other assets. McKinsey & Company designed the Credit Portfolio View model (McKinsey & Company, 1998), which is based on credit metrics. Their model added factors from the macro-economic cycle and established a relationship between macro-economic indicators, such as the economic growth rate, interest rate, and government expenditures, and the transition matrix of the credit rating. In addition, this model uses the Monte Carlo method to simulate changes in the transitional probability of the rating with the degree of cyclical factors. The Credit Monitor model, developed by KMV Ltd. in the United States (Chen, 2014), estimates the probability of loan defaults. The Credit Risk, issued by the financial products development department of the Swiss Credit Bank (Basel Committee on Banking Supervision, 2001), calculates the probability of defaults. There are many other artificial intelligent methods used for credit ratings, such as integer programming, artificial neural network, genetic algorithm, and support vector machine algorithm.

Support Vector Machine
The support vector machine method was first proposed by Cortes and Vapnikin in 1995. It is easily combined with other methods and has subsequently been popularized. Because it has the advantage of solving problems using nonlinear, high-dimensional classification and regression with relatively high accuracy, it is widely used in the fields of disease diagnosis, handwritten font and text recognition, face recognition and image retrieval, analysis and application in engineering technology, and evaluation and prediction in economic and management fields. Qifeng et al. (2005) selected more than 1000 sample data points for enterprises in the light industry from a certain commercial bank in 2003. The data included the ratios of the debt payment, profitability, operational management, and the output results of Grades AAA, AA, A, and A−. Empirical studies using support vector machine achieved an overall test accuracy of 83.15% with a faster learning speed than that of the neutral network method. This formed a suitable credit rating method for commercial banks. Zhou et al. (2009) explored how to select the credit score parameters using support vector machine and produced good results with two real-world credit datasets. Kim et al. (2012) compared support vector machine and other artificial intelligence methods for multi-class problems and obtained an improved performance. Harris (2015) used a clustered support vector machine to solve the binary classification credit score problem and obtained better results compared with that of previous research. Ping-Feng (2015) proposed a new type of decision tree support vector machine that combined rough set theory and support vector machine to solve multi-class problems.
At present, there are a large number of studies that have been conducted based on support vector machine. Existing studies on the credit ratings of enterprises have mostly focused on problems of binary classification and less on the problem of multiple classifications. For binary classification problems, the same amount of sample data is generally chosen in the normal and control groups, and there have been few studies concerned with imbalanced classifications. In credit rating index systems for enterprises, indices are generally selected from financial statements that reflect historical information, and few qualitative indicators are used. Therefore, a support vector machine model for determining the credit ratings of enterprises could be further applied to play a more important role in practice.

Problem and Concept Analysis
According to the regulations and the collected sample data, this paper defines small enterprises as having an owner equity of more than 6 million Yuan. In addition, the number of employees is low, approved financial reports by completely audited or third-party agencies cannot be provided, and the applied loan amount is below 30 million Yuan. There needs to be an increase in the qualitative indices combined with the quantitative indices in the credit rating index system for small enterprises. Commercial banks could initially evaluate small enterprises using a decision system, and then, credit managers could analyze and judge whether to give loans to small enterprises using an artificial expert method. During this period, commercial banks would still need to regularly measure the risk exposure of older customers, paying close attention to the development of small enterprises, and reduce the possibility of bad debts as much as possible. Therefore, it is necessary to establish a model based on data mining and a machine-learning algorithm to provide information to the credit manager in the credit ijef.ccsenet.org International Journal of Economics and Finance Vol. 12, No. 6; 2020 management and risk management department according to the requirements and regulations of the new Basel agreement and China's banking regulatory commission.
In general, there are only a few enterprises that default in each commercial bank's database. According to existing data in the databases of commercial banks, models were selected, which identified possible defaulting customers in existing customers, using the structure of an imbalanced classification problem. An imbalanced classification problem is equivalent to the binary imbalanced classification and multiple imbalanced classification problems.
At present, there are many studies focusing on the binary imbalance classification problem. Multiple imbalanced classification problems generally refer to problems with classification categories. However, there are more than two types of categories, and there are significant differences between the numbers in each group sample, especially because there is only a small amount of sample data in certain individual groups in multiple classification problems. In addition, with many users using many different types of methods, it is difficult to fully learn the characteristics of each group, which leads to a decrease in the accuracy of the classifications. The credit rating problem for enterprises in commercial banks is a typical multiple imbalanced classification problem. First, there are 10 distinguished credit rating grades, namely, AAA, AA, A, BBB, BB, B, CCC, CC, C, D, and some commercial banks even add A+ and A− to make 12 credit rating grades, based on the different characteristics of management and the performance of the enterprises. Second, there are different numbers of enterprises in each sample data category in the commercial bank databases. For example, the vast majority of enterprises are above Grade A, and a few are below Grade BBB, even if there are still large differences between the Grade AAA, AA, and A sample groups. Third, the amount of sample data in a certain category might be zero. Because enterprises with low grades are rejected by commercial banks, there are no sample data for some categories (such as grade D) in the customer databases of commercial banks. In this paper, we use a credit rating system with 10 grades, including grades AAA, AA, A, BBB, BB, B, CCC, CC, C, and D, for a small commercial bank. Table 1 shows the number of enterprises in each grade for 164 enterprises in the customer database of the commercial bank. It can be seen from Table 1 that there are only seven grades with data and there are no sample data below Grade CCC. Classification results for 7-10 grades might exist for different years; this is different from other multiple imbalance and binary classification problems in which the quantities of the sample data are nearly the same. A basic machine-learning algorithm model cannot effectively learn the characteristics of the information from this type of sample.
Existing studies indicate that there are three methods for solving an imbalanced classification problem: (1) improvements in the algorithm, (2) improvements in the data sampling technique, and (3) the simultaneous improvement of both the algorithm and the data sampling. Algorithm improvements could include changing the inherent characteristics and the original treatment principle of the algorithm, which would allow the calculation and analysis of the model to adapt to the requirements of the problem. Techniques for improving the data sampling focus on the selection methods for the data and can be used independently. Over-sampling increases the number of sample data in the minority grades, and under-sampling decreases the number of sample data in the majority grades. A hybrid algorithm combines the sampling and algorithm techniques.
When training sample data sets, more attention should be paid to the minority samples, and the data characteristics of minority samples should be analyzed. Support Vector Machine (SVM) based on Statistical Learning Theory (SLT) has superiority in solving classification and regression problems. Therefore, it can improve the basic algorithm of SVM to achieve the characteristics of machine learning algorithms for solving different classification and regression problems.

Methodology
According to the book -The theory and algorithms of support vector machines‖ written by Deng et al. (2005) and Shen (2004), the support vector machine method mostly solves the problems of regression and classification, which include linearly separable problems and linearly inseparable problems.
For linearly separable problems, the training sample dataset in the binary classification problems is ϵ , 1, 2,..., i n  , and the classification of the corresponding level is ϵ *−1, 1+, i=1, 2. The classified hyperplane ( · ) + = 0, where · is the dot product of and . Two types of sample data, both satisfying the ijef.ccsenet.org International Journal of Economics and Finance Vol. 12, No. 6; 2020 constraint ,( · ) + -≥ 1, i = 1, 2, … , n and the classified margin equaling to 2 w , are generated. Under the constraint of ,( · ) + -≥ 1, i = 1, 2, … , n, the objective function maximizes 2 w . Maximizing the classified margin is the same as minimizing 2 2 w . The optimal classified hyperplane can divide the sample data and minimize 2 2 w . Support vectors on the hyperplane contribute to the optimal hyperplane and the decision functions. Therefore, it is unnecessary to require all training data to be on ,( · ) + -≥ 1, i = 1, 2, … , n, and the constraint conditions can be relaxed to , describes the degree to which the training set is incorrectly distinguished with incorrect data. The penalty parameter C is an adjustable parameter greater than 0; a large C indicates a punishment for faulty classifications. This is a quadratic programming problem, and the following equations are used to solve the optimization problem: Consequently, the parameter C is used to balance the training accuracy and the generalization ability. ξ i indicates the slack variables used to solve the problem over a larger feasible region, w ∈ R n is a weight vector to explain the location of the separating hyperplane in each space, and b is the position error of the mobile hyperplane. Because this is a quadratic programming problem, the optimal solution is the following Lagrange function of saddle points: where ≥ 0 and ≥ 0 are the Lagrange multipliers. At the saddle point, the gradients of w, b, and ξ are zero; therefore, ¶L ¶w Inserting Eqs. (5)-(7) into Eq. (4) and calculating the maximum of Eq. (4) on  , the dual optimization problem of Eqs. (1)-(3) are obtained as shown below: To solve the above equations,  needs to satisfy = 0, 0 < < and = . If 0 < < and = , the corresponding i x is the support vector. In the support vector machine method, the corresponding of = is on the border and is known as the bound support vector. In addition, the corresponding of 0 < < is in the interval and is known as the normal support vector. According to the Karush-Kuhn-Tucker (KKT) conditions, at the optimum points, the Lagrange multiplier and the constraint conditions both equal 0: For a normal support vector (0 < < ), it is known that > 0 from Eq. (7) and Eq. (12) and = 1. However, for any normal support vector, Here, JN is the set of normal support vectors, and J is the set of support vectors. The constraints of Eqs. (2) and (3) limit w and b and make the empirical risk of error equal to 0. At the same time, they minimize w to minimize the VC dimension. Therefore, the optimization of Eq. (1) embodies the principle of structural risk minimization and has a good generalization ability. This method could therefore solve linearly separable problems very well.
However, linearly inseparable problems indicate that using any straight line would incorrectly distinguish large amounts of data in the training set. For linearly inseparable problems, support vector machine selects a kernel function K, which is used on the sample data to map the dataset to a high-dimensional data space, transforming the linearly inseparable problem into a linearly separable problem and constructing an optimal hyperplane separating the points of the difficult nonlinear data. Different kernel functions obtain different classifiers, and the parameters used in the selection of the kernel function are very important. Under this condition, Eq. (8) changes to the following form: Here, ( , ) = , ( ) · ( )is a kernel function that solves the dual problem to determine the final decision function: If the kernel function K( , ) is appropriately selected, the linearly inseparable problem in input space can be transformed into a linearly separable problem in feature space. There were many different kernel functions that can be used in a support vector machine model. In this paper, the Gaussian radial basis function was used as the kernel function.
The sample data were mapped to a high-dimensional space using the kernel function, which was then used to solve the problem using the nonlinear relationship between the class labels and the characteristics of the data, as well as the problem of having an insufficient number of prior experiences.  is an inherent parameter of the function that maps data to the distribution in the new feature space. C is the penalty parameter, which indicates that there are fewer errors in the support vector machine classification model when its value increases. The choice of parameters without prior knowledge was achieved via a grid search method, which is a common method used to set parameters.
The process in this paper, using an integrated learning algorithm to improve the support vector machine method, primarily included three steps: segmentation, training, and aggregation. In our sample datasets, the positive subsets were the good credit ratings for small enterprises, such as Grades AAA, AA, A, BBB, BB, B, CCC, CC, C, and D. These enterprises do not default and make up a large percentage of the sample datasets. The poorer credit ratings of small enterprises were negative subsets, and the minority dataset in the customer database, such as Grades C or D, could not easily be predicted by machine-learning methods with inadequate characteristics. The first step in the algorithm is segmentation, which reclassifies the existing sample groups to achieve nearly balanced groups. There are less data in the negative groups; therefore, they did not need to be further segmented.
ijef.ccsenet.org International Journal of Economics and Finance Vol. 12, No. 6;2020 Conversely, the data samples in the positive groups required detailed segmentation and classification. The sample groups could be divided into k ( 3 k  ) subclasses; for example, the classification result of customer information after data cleaning is multiple subcategories. The second step is training, which consolidates the sample data in the negative classes. For example, there were only two data points for grade 7 in the sample data, and these were the negative classes. If traditional methods were directly used to eliminate negative classes, there would be no data. As a result, the sample data for the negative classes were retained, and the sample data for the positive categories were the key points that could be classified in detail. Support vector machine was used to classify the sample data after segmentation. The third step is aggregation. After training the sample data, it is necessary to integrate each individual class to form a suitable method for this type of classified problem to distinguish all different classes according to the distance between each feature vector of the sample data for each class. First, the sample data were separated into negative classes, and then, the sample data were separated into positive categories. In addition, new test data were classified into appropriate classes via the support vector machine method.

Input
The known training set was a small enterprise sample data D = *( 1 , 1 ), ( 2 , 2 ), … , ( , )+, where x represents the information of the different characteristics of each small enterprise and y represents the corresponding grade of the small enterprise. The positive categories in the training set, P, were the good customer datasets for the credit rating, and the negative categories in the training set, N, were the poor customer datasets for the credit rating (the sample size of P was 1 m , the sample size of N was 2 , 1 + 2 = , and 1 ≥ 2 ). M is the number of categories in the positive sample dataset.

Output
Where sgn(x) is a sign function. The final output is the specific category y that corresponds to the arbitrary input of a sample vector x . In the learning process, it is important to pay attention to the choice of M, which is not only the number of categories in the positive data sample set but also the number of classifiers in the integrated learning algorithm. Because the size and distribution of the data affect the efficiency and accuracy of the classification results of support vector machine classifiers according to the actual sample data available each year, given the value of M for classification, it is estimated that M can be used between 6 and 11.

Data
The support vector machine method does not require the sample data to consist of a normal distribution and correlation tests. We collected sample data for small enterprises from the customer database of a city commercial bank in Zhejiang Province, China, in the financial year of 2017. Table 2 is a descriptive statistical analysis of the 164 enterprises and the 17 variables. According to the distribution characteristics of the original data, there were very few sample data that were less than zero. Most were greater than zero and uniformly distributed. To reduce collinearity between the different variables, sample data are mapped to [0, 1], which was an extreme linear model of processing in treating the sample data. The more the order moves from small to large and the array of the large data, the better the data.
Complete credit rating methods for enterprises contain a credit rating index system. Small enterprises in this paper refer to enterprises that have an owner equity above 6 million Yuan. However, the numbers of employees are low, and complete financial statements audited or recognized by third-party agencies cannot be provided. The credit rating index system for small enterprises includes quantitative and qualitative indices. To obtain more accurate results, we drew on the experience of state-owned commercial banks in China and had many discussions with experts. After these discussions, the index system was redesigned as shown in Table 3.

Empirical Results
The result of the experiment was the predicted precision ratio: the accurate numbers of sample data The accuracy rate of precision the numbers of all sample data  (21) For example, C was selected in the range of 100-10,000 for the experiment and was increased by 10 n .  was selected from a range with an increasing speed of -10 n . According to the results of the convergence and the accuracy of the precision, a gradual narrowing of the scope should produce a higher classification accuracy. If C was small and the actual testing accuracy was low, then C was gradually increased and approached the optimal value range for support vector machine. Every trial required approximately 10 min. Owing to time and energy limitations, after repeated testing and analyses, parameter combinations in the classification model were ruled out if they would lead to the results being divergent, not convergent. This was found for the following divergent parameters C = (1800,1900,2000,2100,2200,2500,2700) and  = (0.001, 0.003, 0.005, 0.007, 0.009). The test results in this range were better than those for other parameters. The different combinations of C and  were composed of several different classification models of the support vector machine model. The average precision accuracy of each classification was measured 20 times. There were 35 different testing results for the analysis, as shown Table 4, after the use of the integrated support vector model. Other methods were also used to classify the same sample data of the 164 enterprises to compare to the results of the support vector machine classification method. Table 5 shows the results. It can be seen that the accuracy of the experimental results using other methods is low. After interviews with the credit managers at commercial banks, the credit managers cooperated with the risk assessment manager to identify the potential risks of loans to given enterprises. In practice, the possible amount of variable data needed to analyze small enterprises rapidly and accurately needs to be as small as possible. Therefore, the credit rating variables were reduced and eliminated to see if ideal results could be obtained. Because there were 17 collected variables, there were numerous different possible combinations of variables, all of which could not be tested. Using factor analysis and principal component analysis, dimensionality reduction was found to be unsuitable for credit rating analyses in commercial banks. The principal components and factors calculated were not stable, and it was difficult to interpret the results. Therefore, we could only delete variables according to a correlation analysis, which was based on changes in the accuracy rate, to determine combinations of variables.
First, 16 variables were selected from the 17 variables of the valid sample data for the 164 small enterprises. After standardization of the sample data, the owner equity variable was equal to 1; therefore, this variable was eliminated. The new index system included 16 variables: the working years of the manager, educational background of the manager, corporate lifetime, investors' assets, sales output ratio, debt ratio, current ratio, accounts receivable turnover, sales growth rate, profit growth rate, return on equity, personal credit record of the manager, industry policy, local environment, operating site conditions, and equipment utilization. The test results are shown in Table 6. ijef.ccsenet.org International Journal of Economics and Finance Vol. 12, No. 6; 2020  Table 6 indicates that the accuracy rate of the classification with 16 variables is lower than that with 17 variables, showing that support vector machine is more effective for high-dimensional classification problems with higher accuracy.
Next, 16 variables were again selected from the 17 variables of the valid sample data. After standardization, the personal credit records of the manager variables were all good in the sample data, and the standard was 1. As a result, 16 variables were used for testing, including the working years of the manager, educational background of the manager, corporate lifetime, investors' assets, sales output ratio, debt ratio, owner's equity, current ratio, accounts receivable turnover, sales growth rate, profit growth rate, return on equity, industry policy, local environment, operating site conditions, and equipment utilization. The test results are shown in Table 7. A third test used eight variables, including the debt ratio, current ratio, sales growth, sales growth rate, return on equity, corporate lifetime, industry policy, investors' assets, and sales output ratio and omitting the working years of the manager, educational background of the manager, owner's equity, accounts receivable turnover, profit growth rate, personal credit record of the manager, local environment, operating site conditions, and equipment utilization. The test results are shown in Table 8. A second test also used eight variables, including the debt ratio, current ratio, sales growth rate, return on equity, corporate lifetime, industry policy, investors' assets, and sales output ratio and omitting the educational background of the manager, working years of the manager, owner's equity, current ratio, accounts receivable turnover, profit growth rate, return on equity, personal credit records of the manager, industry policy, local environment, operating site conditions, and equipment utilization. The test results are shown Table 9. ijef.ccsenet.org International Journal of Economics and Finance Vol. 12, No. 6; 2020 Different combinations of the variable parameters, C and γ, together constitute a support vector classification machine. As a result, there were 35 support vector classification machines in one table, all operated multiple times, which built more than a thousand classifiers of the support vector machine. The result of the operations was the average accuracy rate after 20 repetitions, which enhanced the precision and robustness of the classification. From Table 10, it can be seen that the accuracy results with the other methods are low.

Conclusions
In China, it is necessary for commercial banks to identify the credit ratings of small enterprises. This paper uses a suitable ensemble support vector machine method to analyze sample data from a customer database in a commercial bank. After many tests and analyses, the index system of the variables was gradually adjusted. Because the characteristics of support vector machine are suitable for high-dimensional nonlinear classification problems, more features were included in the variable indices. Therefore, the attained accuracy was higher. The support vector machine method does not require the sample data to have a normal distribution nor does it need correlation tests to solve this type of imbalanced multi-classification problem and to enhance the precision and robustness of the classification.
The accuracy rate of classification The numbers of variable 8 16 80% 60% Figure 1. A schematic diagram of the relationship between the number of variables and the accuracy rate of the classification From Figure 1, it can be seen that there were 15-17 variables in the index system sample data and that the accuracy rate of the classification was close to 80%. Decreasing the number of variables in the index system caused the accuracy rate of the classification to decrease. The accuracy rate for an index system of eight variables was over 62%. For some combinations of parameters, it was above 70%, which is a relatively good evaluation ijef.ccsenet.org International Journal of Economics and Finance Vol. 12, No. 6; 2020 55 accuracy. When the number of variables decreased to seven, the accuracy quickly fell below 60% (Figure 1).