Robust Support Vector Regression Model in the Presence of Outliers and Leverage Points

Support vector regression is used to evaluate the linear and non-linear relationships among variables. Although it is non-parametric technique, it is still affected by outliers, because the possibility to select them as support vectors. In this article, we proposed a robust support vector regression for linear and nonlinear target functions. In order to carry out this goal, the support vector regression model with fixed parameters is used to detect and minimize the effects of abnormal points in the data set. The efficiency of the proposed method is investigated by using real and simulation examples.


Introduction
Support Vector Regression (SVR) involves a new class of learning algorithms which is presented by Cortes and Vapnik (1995).It is a universal technique to handle regression problems.Since then, SVR has attracted the interest of researchers due to its excellent performance of solving a variety of learning problems (Ceperic, 2014;Dhhan et al. 2015).Some additional reasonsstand of behind the widely use of the SVR such as lower sensitivity to local minima, theoretical guarantees about its performance, and high flexibility to add extra dimensions to the input space, whichprevents the increasing of the model complexity (Ceperic, 2014).According to Chuang et al. (2002), there is possibility to select outliers as support vectors.Further, the highest Lagrange multipliers in EQ. (4) mostly belong to the data points which are considered as outliers in the training data and it control the model (Jordaan & Smits, 2004).
In General, samples are always subject to unusual data which is called outliers.Hawkins, (1980) define an outlier as "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism".There are various reasons behind the presence of outliers: misplaced decimal points, measurement errors that come from lack of experience, and exceptional phenomena such as earthquakes (Rousseeuw & Leroy, 1987).Robust regression is concerned with developing estimator that is not sensitive to the outliers and to the noise distribution (Calafiore, 2000).The main idea of robust methods is to give low weights for outliers.
In this study, we have proposed Robust SVR based on fixed parameters SVR technique (FP-SVR), and we call it double support vector regression (DSVR).This articlehas shown that the proposed robust method achieves a higher efficiency in comparison with the standard SVR model.This paper is organized as follows: A brief description of the support vector regression model is given in Section 2. In section 3, we present the proposed method based on FP-SVR.In Section 4 and 5, we apply the proposed method to real and simulation data sets.Finally, the discussion of the results is given in Section 6.

Support Vector Regression
In the SVR technique, the input vector is first mapped onto a high-dimensional predictor space, which is nonlinearly related to the input space.The idea of SVR model is to employ the kernel function to transform the nonlinear relationship in the input space to linear form in a high-dimensional feature space (Vapnik 2000).The linear model is given by ( , ) = , Φ( ) + where Φ( ) denotes a non-linear function, and are the slope and the bias term respectively.
Support vector regression technique aims to estimate the parameter values and that optimize predicted risk by minimizing the following loss function In order to find a flat function ( , ), minimizing the Euclidean norm ‖ ‖ is needed (Smola and Schölkopf 2004).This can be done by introducing some positive slack variables ( , * ), to measure deviations of the training vectors outside the ε-tube.Thus, a convex optimization problemcan be formulated as: The coefficient is defined as the trade-off between complexity of the model and the number of deviations that larger than the coefficient ε is tolerated (Smola and Schölkopf 2004).The coefficient ε controls the width of the ε-zone, which is used to fit the training data (Vapnik 2000).Thus, the estimated SVR function can be written as follows: ( ) = ∑ ( − * ) ( . ) + (4) where and * define as Lagrange multipliers, and ( .)is the function of kernel.

The Proposed Method Based on Fixed Parameter SVR
The main contribution of this study is to detect outliers and leverage points first, and then to minimize their effects.To achieve the first step, we have used the FP-SVR (Dhhan et al. 2015) to detect outliers and leverage points.The FP-SVR technique is insensitive to outliers because it succeeded to control the free parameters (ε, , and h).According to FP-SVR model ( 5), the optimal parameters to detect outliers are: ε =0, =100000, and h=1.

=
= ∑ ( − * ) ( . ) + , , * 0, (5) Based on FP-SVR, any point with a value larger than theCP ( 6) is considered to be an outlier This method can be employed to prevent the effects of abnormal data the estimation.This can be done by using the following weight function.
The Eq. ( 4) can be rewritten based on function to achieve the final robust SVR function

Artificial and Real Case Studies
In this part, we apply our proposed method (DSVR) to Belgian phone data, Hawkins-Bradu-Kass data and simulation example (Rousseeuw and Leroy, 1987), and compare the results with standard SVR.The mean squaire error (MSE) of the residuals is used to evaluate the proposed technique.

Belgian Phone Data
In this example, the proposed technique is tested using real data which has vertical outliers.In the Belgian Statistical Survey, a data set was found containing the total number of international phone calls made between the years 1950 and 1973, which contains heavily contaminated data (Leroy and Rousseeuw, 1987).As shown in Table and Figure 1, the proposed approach is effective to minimize the effects of outliers much better than the standard SVR model.

Hawkins-Bradu-Kass Data
To illustrate the superiority of the proposed method, we have applied it to Hawkins-Bradu-Kass data.The data set consists of three predictors in additional to the dependent variables, with 75 observations.The first 10 observations are classified as bad leverage points (have bad effect on the regression estimator), and next four observations are considered as good leverage points (have no or small effect on the regression estimator).The comparison results are recorded in Table and Figure 2.These results clearly show the superiority of our proposed method DSVR over the standard SVR method in terms of achieve lower values of the MSE.Based on these results we can conclude that the use of the proposed method is recommendedfor this data set.
Figure 2. The MSE of SVR and the DSVR methods for HBK data

Simulation Study
In order to generalize the DSVR technique, we simulate a nonlinear regression model with two predictors.
The values of are sampled from uniform distribution, while the residuals = , are simulated based on standard normal distribution.We contaminatethe data by differentoutliers' percentages (10%, 15% and 20%).This contamination is implemented by replacing some points of by extreme value equal to 100.
The results of the comparison of the proposed DSVR and the standard SVR are shown in Table 3.This explains the estimates of DSVR and SVR for different samples and contamination levels.These results are explained graphically in Figures 3, 4 and 5.These results reveal that the DSVR has smaller MSE than the standard SVR.

Conclusion
In this article, we have proposed the robust support vector regression technique for nonlinear function.Although, SVR is nonparametric method but it is still affected by outliers.In order to compare our proposed method, DSVR, with standard SVR method, real and simulation examples have been used.The results are calculated based on some of SVR parameters and sample sizes.Finally, the comparison results are demonstrated, the superiority of proposed method, DSVR over standard SVR method for nonlinear target functions

Table 1 .
The MSE of SVR and the DSVR methods for Belgian phone data

Figure 1 .
Figure 1.The MSE of SVR and the DSVR methods for Belgian phone data

Figure 3 .
Figure 3.The MSE of the SVR and the DSVR methods for different sample sizes and 10% percentage of contamination

Table 2 .
The MSE of SVR and the DSVR methods for HBK data

Table 3 .
The MSE of the SVR and the DSVR methods for different sample sizes and percentages of contamination