A Regularized Newton Method with Correction for Unconstrained Nonconvex Optimization

In this paper, we present a modified regularized Newton method for minimizing a nonconvex function whose Hessian matrix may be singular. We show that if the gradient and Hessian of the objective function are Lipschitz continuous, then the method has a global convergence property. Under the local error bound condition which is weaker than nonsingularity, the method has cubic convergence.


Introduction
We consider the unconstrained optimization problem where f : R n → R is twice continuously differentiable, whose gradient ∇ f and Hessian ∇ 2 f are denoted by g(x) and H(x) respectively.Throughout this paper, we assume that the solution set of (1) is nonempty and denoted by X, and in all cases ∥•∥ refers to the 2-norm.
It is well known that f (x) is convex if and only if H(x) is symmetric positive semidefinite for all x ∈ R n .Moreover, if f (x) is convex, then x ∈ X if and only if x is a solution of the system of nonlinear equations Hence, we could get the minimizer of f (x) by solving (2) (C.T. Kelley, 1999, W. Sun, 2006, W. Zhou, 2008).The Newton method is one of a efficient solution method.At every iteration, it computes the trial step where g k = g(x k ) and H k = H(x k ).As we know, if H k is Lipschitz continuous and nonsingular at the solution, then the Newton method has quadratic convergence.However, this method has an obvious disadvantage when the H k is singular or near singular.
To overcome the difficulty caused by the possible singularity of H k , (D. Sun, 1999) proposed a regularized Newton method, where the trial step is the solution of the linear equations where I is the identity matrix.µ k is a positive parameter which is updated from iteration to iteration.Now we need to consider another question, "how to choose the modified regularized parameter µ k ?" which will play important roles not only in theoretical analysis but also in numerical experiments.Yamashita and Fukushima (D. H. Li, 2004) chose µ k = ∥g k ∥ 2 and showed that the regularized Newton method has quadratic convergence under the local error bound condition which is weaker than nonsingularity.Fan and Yuan (J.Y. Fan, 2005) took , 2] and showed that the Levenberg-Marqularity method preserves the quadratic convergence under the same conditions.Numerical results (J.Y. Fan, 2009, Jinyan Fan, 2014) show that the choice of µ k = ∥F k ∥ performs more stable and preferable.
In most past studies (N.Yamashita, 2001, Polyak, 2009, J. Y. Fan, 2009, Jinyan Fan, 2014) for the regularized Newton method, the convergence properties have been discussed only when f is convex.In this paper, we propose a modified Newton method for (1) whose objective function f is nonconvex, which is mainly motivated in (Dong-huiLi, 2004).Dong-Huili, Masao Fukushima, Liqun Qi and Nobuo Yamashita have proposed that regularized Newton and inexact Newton methods are possible extended to nonconvex minimization problems (Dong-huiLi, 2004).They chose Λ k to satisfy where C > 0 is a constant and λ 1 (H k ) is the minimum eigenvalue of H k .Based on the better performance of the modified regularized method with Λ k = max(0, −λ min (H k )), we will consider the choice of in this paper.
We extend the regularized Newton method (4) to the unconstrained nonconvex optimization.At the k-th iteration of the modified regularized Newton method , we set regularized parameter µ k as From the definition of Λ k , the matrix we can use regularized Newton method to solve the problem of (1).
The main scheme of the modified regularized Newton method for unconstrained nonconvex optimization is given as follows.At every iteration, it solves the linear equations to obtain the Newton step d k , where µ k = α 1 Λ k + λ k ∥g k ∥, and then solves the linear equations to obtain the approximate Newton step d k .
The paper is organized as follows.In section 2, we present a new modified regularized Newton algorithm by using trust region technique, then prove the global convergence.In section 3, we study the convergence rate of the algorithm and obtain the cubic convergence under the local error bound condition.Finally, we conclude the paper in section 4.

The Algorithm and Global Convergence
First, we give the modified regularized Newton algorithm.Define the actual reduction of f (x) at the k-th iteration as Note that the Newton step d k is the minimizer of the problem: By the famous result given by Powell in (M.J.D. Powell, 1975), we know that Similar to d k , d k is not only the minimizer of the problem min but also the solution of the following trust region problem Therefore we also have Based on the inequalities ( 10) and ( 11), it is reasonable for us to define the new predicted reduction as which satisfies Pr The ratio of the actual reduction to the predicted reduction plays a key role in deciding whether to accept the trial step and how to adjust the regularized parameter.
The regularized Newton algorithm with correction for unconstrained nonconvex optimization problems is stated as follows.
Algorithm 2.1 Step 1.Given to obtain d k . Set to obtain d k . Set Step 4. Compute r k = Ared k Pr ed k .Set Step 5. Update λ k+1 as Set k := k + 1 and go step 2.
Before discussing the global convergence of the algorithm above, we make the following assumption.
Assumption 2.1 g(x) and H(x) are both Lipschitz continuous, that is, there exists a constant and The following lemma given below shows the relationship between the positive semidefinite matrix and symmetric positive semidefinite matrix.
Lemma 2.1 A real-valued matrix is positive semidefinite if and only if ) /2 is positive semidefinite.
Next, we give the bounds of a positive definite matrix and its inverse.
Lemma 2.2 Suppose A is symmetric positive semidefinite.Then, hold for any φ > 0.
Theorem 2.1 Under the conditions of Assumption 2.1, if f is bounded below, then Algorithm 2.1 terminates in finite iterations or satisfies lim Proof.The proof is similar to (Weijun Zhou, 2013).We prove by contradiction.If the theorem is not true, then there exists a positive τ and an integer k such that Without loss of generality, we can suppose Now we will analysis in two cases whether T is finite or not.
Case (1): T is finite.Then there exists an integer k 1 such that Therefore by ( 18) and ( 23), we deduce Since x k+1 = x k , ∀k ≥ k 1 , we get from ( 15) and ( 24) that From ( 16), we obtain where γ 1 is a positive constant.
Duo to ( 27) and (28), we get which implies that r k → 1.Hence, there exists positive constant γ 2 such that λ k ≤ γ 2 , holds for all large k, which contradicts to (24).
Hence, there exists a positive constant γ 4 > m such that λ k ≤ γ 4 holds for all sufficiently large k, which gives a contradiction to (35).The proof is completed.

Local Convergence of Algorithm 2.1
In this section, we show that the sequence generated by Algorithm 2.1 converges to some solution of (1) cubically.
To study the local convergence properties of Algorithm 2.1, we make the following assumptions.
Assumption 3.1 (a) The sequence {x k } generated by Algorithm 2.1 converges to x * ∈ X and lies in some neighbourhood of x * .
(b) ∥g(x)∥ provides a local error bound on some N (x * , b 1 ) for (2), that is, there exist constant (c) The Hessian H(x) is Lipschitz continuous on N (x * , b 1 ) , i.e., there exists a positive constant L 1 such that Note that, if H(x) is nonsingular at a solution, then ∥g(x)∥ provides a local error bound on its neighbourhood.However, the converse is not necessarily true, for examples please refer to (N.Yamashita, 2001, Dong-huiLi, 2004).Hence, the local error bound condition is weaker than nonsingularity.
By Assumption 3.1 (c), we know and there exists a constant L 2 > 0, such that In the following, we denote x k the vector in the solution set X that satisfies The following lemma gives the relationship between the trial step s k and the distance from x k to the solution set.
Lemma 3.1 Under the condition of Assumption 3.1, for all sufficiently large k, we have Due to (37), we have From (15), we get 16), we get Combining ( 42) and (43), we obtain (44)

The Boundedness of λ k and Λ k
In the following, we will show λ k and Λ k are bounded above, which will play a key role in the next subsection.
Lemma 3.2 Under the condition of Assumption 3.1, then there exists a positive constant T > m such that λ k ≤ T holds for all sufficiently large k.
Proof.From (10), ( 37) and (40), we have where Σ * k,1 is a diagonal matrix.Moreover, we can suppose that H(x) has the following decomposition where rank ) and Σ k,2 converges to zero as x → x * .In the following, we neglect the subscription k in Σ k,i and V k,i (i = 1, 2), and write H(x k ) as Lemma 3.5 (W.Sun, 2006).If the sequence {x k } converges superlinearly to x * , then Proof.See (W.Sun, 2006).
Therefore we have from Lemma3.5, ( 50), ( 51) and ( 44) that there exists two positive constants β 3 and β 4 such that which means that ∥x k − x * ∥ is equivalent to ∥x k − x k ∥ .