Estimation of Multivariate Smooth Functions via Convex Programs

A new method for estimating an unknown, multivariate function from noisy data is proposed in the case where the unknown function is assumed to be smooth. The proposed method finds the minimizer of the smoothness of a function while imposing an upper bound on the sum of squared errors between the function and the data set. The proposed estimator is designed to be numerically sound by eliminating the dependency on any artificially plugged-in parameters that traditional methods use and by tackling the ill-conditioned numerical settings that traditional methods suffer from. Hence, it is expected to perform better than existing estimators numerically. We prove the existence of the proposed estimator and show that we can compute the proposed estimator through a convex program. Empirical studies illustrate that the proposed method is effectively applied to the problem of estimating the average payoff of a stock option that is contingent on two different stocks.


Introduction
We look at the problem of estimating an unknown function f * : Ω ⊂ R d → R, where R d is the d-dimensional Euclidean space.We presume that we observe r noisy measurements Y i1 , . . ., Y ir of f * at each t i = (x 1 (i), . . .x d (i)) ∈ Ω and that Y i j = f * (t i ) + ϵ i j for 1 ≤ i ≤ n and 1 ≤ j ≤ r, where ϵ i1 , . . ., ϵ ir are independent and identically distributed (iid) random variables with a mean of 0 and a variance of σ 2 i < ∞.In many applications, f * is assumed to be smooth.We thus assume that f * has partial derivatives of total order m that are square integrable over Ω.In such a case, one naturally tries to fit a function f to data by minimizing the smoothness of the fitted function f , which can be measured by While minimizing the smoothness J d m of f over X, we also want to ensure that the fitted function f is close to the data points.The closeness between f and the data points can be measured by where Y i = ∑ r j=1 Y i j /r for 1 ≤ i ≤ n.An upper bound on I( f ) can be obtained by noticing that ) 2 for large r by the weak law of large numbers σ 2 i for large n by the strong law of large numbers, where N(0, σ 2 ) is normally distributed with a mean of 0 and a variance of σ 2 .The notation "≈" expresses equality in some appropriate sense.
Since σ 2 i can be estimated by the sample variance be obtained by So, we propose fn , the solution to the following minimization problem, as our estimate of f * : Problem (A) can serve as an alternative approach to the following traditional problem: for some λ > 0, which was studied extensively in the literature by Duchon (1977), Meinguet (1979), andCox (1984) among others.Despite its rich theory established by numerous researchers, the performance of the solution to Problem (B) is dependent on the parameter λ, and the procedure of finding the right value of λ is computationally burdensome.Furthermore, the solution to Problem (B) can be computed by solving a system of linear equations, which involves an ill-conditioned matrix whose diagonal elements are all zero (page 145 of Dierckx, 1993).To tackle this ill-conditioned setting, additional numerical procedures must be performed.
On the contrary, the numerical performance of the solution to Problem (A) does not depend on any artificially plugged-in parameters since one can obtain a good estimate of S 2 via Equation (1).Moreover, we prove that Problem (A) always has a solution and that the solution can be found by using a convex program.There are many efficient algorithms that can solve a convex program with guaranteed convergence, so our convex program formulation makes it possible to utilize such algorithms to find the solution to Problem (A) with guaranteed convergence.The numerical results in Section 3 illustrate that the solution to Problem (A) is computed successfully within a few seconds.
There are many applications where the independent variable of a regression function lies in the multi-dimensional space.
Even though Problem (A) was mentioned as one of the possible solutions to estimation of a multivariate smooth function in Equations (8.2) and (8.3) on page 139 of Dierckx (1993), the existence of the solution to Problem (A) has not been proved yet and a numerical procedure with guaranteed convergence has not been proposed so far.This paper provides these missing foundations by proving the existence of the solution to Problem (A) and by providing numerical procedures with guaranteed convergence based on convex programs.
This paper is organized as follows.In Section 2, we prove the existence of the solution to Problem (A), and provide the convex program formulation of Problem (A).In Section 3, we present numerical results to illustrate that our proposed estimator successfully recovers the underlying true function as n increases to infinity.Concluding remarks are included in Section 4.

Proposed Formulation
In this section, we prove that Problem (A) always has a solution in Proposition 1. Proposition 1 also describes how we can solve Problem (A) with a convex program.
To describe the details of Proposition 1, we need some preliminaries.Let P be the set of all polynomials over Ω of the total degree less than m.It should be noted that P is the vector space of the dimension spanned by the monomials of the total degree less than m.For example, when d = 2, Ω = [0, 1] 2 , and m = 3, P is spanned by φ 1 , . . ., φ 6 , which are defined by for all (x 1 , x 2 ) ∈ Ω.We will denote the K monomials of the total degree less than m by φ 1 , . . ., φ K .
A set {u 1 , . . ., u K } of distinct points of Ω is called unisolvent for P if there exists a unique p ∈ P satisfying p(u i ) = z i for 1 ≤ i ≤ K and for any real numbers z 1 , . . ., z K .It should be noted that {u 1 , . . ., u K } is unisolvent for P if and only if the matrix M = We are ready to present Proposition 1.
Proposition 1 Assume that 2m > d and that {t 1 , . . ., t n } contains a subset {u 1 , . . ., u K } which is unisolvent for P. Then there exists a solution fn to Problem (A).Furthermore, (a) fn : Ω → R can be represented as (b) â1 , . . ., âK+n , ŷ1 , . . ., ŷn is the solution to the following program with the decision variables a 1 , . . ., a K+n , y 1 , . . ., y n ∈ R: where It should be noted that C is a nonempty, closed, and bounded subset of R n .
By Talmi and Gilat (1977), for any (y 1 , . . ., y n ) in R n , there exists a unique f ∈ X such that 1. f (t) = ∑ K+n i=1 a i φ i (t) for all t ∈ Ω and for some a 1 , . . ., a K+n , 2. f (t i ) = y i for 1 ≤ i ≤ n, and The a i 's in 1 are uniquely determined by the following linear system with K + n variables and K + n linear equations: By the uniqueness of the a i 's, the linear system (3) is nonsingular, and hence, the mapping from (y 1 , . By Talmi and Gilat (1977), there exists a function ḡ ∈ X and a 1 , . . ., a K+n that satisfy 1, 2, and 3 with y i replaced with resulting in J d m ( fn ) ≤ J d m (g).Hence, fn is a minimizer of Problem (A), proving (a).To prove (b), we first notice that â1 , . . ., âK+n , ŷ1 , . . ., ŷn is a feasible solution to (2).Let ã1 , . . ., ãK+n , y 1 , . . ., y n be a feasible solution to (2).By Talmi and Gilat (1977), there exists a unique f ∈ X such that i. f(t) = ∑ K+n i=1 a i φ i (t) for all t ∈ Ω and for some a 1 , . . ., a K+n , ii. f(t i ) = y i for 1 ≤ i ≤ n, and iii.
To prove (c), we note that for all a 1 , . . ., a K+n ∈ R, and hence, Problem (2) is a quadratically constrained quadratic program.Thus, it follows that Problem (2) is a convex program. 2

Numerical Results
We now examine how the proposed estimator fn can be calculated in two numerical settings.
In the second setting, we look at the case where the true function f * (x 1 , x 2 ) is the average payoff of a stock option that is contingent on two different stocks, where x 1 and x 2 denote the prices of the underlying stocks.
In both settings, we calculate the mean square error (MSE) of the proposed estimator fn , defined by and compare it to the MSE of the crude estimator Y 1 , . . ., Y n , defined by The numerical results show that the MSE of the proposed estimator converges to 0 as n increases to infinity while the MSE of the crude estimator converges to a positive constant.This phenomenon empirically shows that our proposed estimator converges to the underlying function as n increases to infinity in the L 2 sense.
We conducted all simulations using a 64-bit computer with an Intel(R) Core TM i7-6700K CPU at 4GHz and a 32 GB RAM.We also programmed all simulations in MATLAB R2016a.

A Simple Case
We assume that f } , Y i j = f (t i ) + ϵ i j for 1 ≤ i ≤ n and 1 ≤ j ≤ r with r = 1, and the ϵ i j 's are iid normal random variables with a mean of 0 and a variance of 4.
We assume that it is known a priori that f * has integrable partial derivatives of a total degree of 2.Then, our proposed estimator fn can be computed as for 1 ≤ i ≤ n and (x 1 , x 2 , x 3 ) ∈ [1, 3] 3 , and (â 1 , . . ., ân+4 ) is computed from the following program: The triple integration appearing in the objective function of Problem ( 6) is computed in MATLAB using MATLAB functions "triplequad" and "integral3".Problem (A) is then solved using CVX, developed by Grant and Boyd (2014).
The MSE of fn is calculated by and is compared to the MSE of the crude estimator Y 1 , . . ., Y n , We calculate the 95% confidence intervals of the MSE using 400 iid replications, and report them in Table 1.We also measure the amounts of time spent while calculating the proposed estimator, and compute their averages using 400 iid replications.We report these values in Table 1.We observe that the MSEs of the proposed estimator are less than those of the crude estimator.The MSE of the proposed estimator converges to 0 as n increases.We can thus deduce that the proposed estimator converges to the true function as n increases to infinity.
Table 1.The 95% confidence intervals of the MSE and the average amount of time spent when . Table 2 reports the 95% confidence intervals of the MSE and the average amounts of time spent to compute the proposed estimator, using 400 iid replications.
Table 2.The 95% confidence intervals of the MSE and the average amount of time spent when

Estimation of "Gamma" of a Stock Option
We consider the case where f * : [0.5, 1.5] 2 → R is given so that f * (x 1 , x 2 ) is the average payoff of a stock option that is contingent on the stock price x 1 of Stock 1 and the stock price x 2 of Stock 2. The payoff structure of this stock option is given as follows.The option is active between time 0 and time T = 365.The prices of Stock 1 and Stock 2 at time t ∈ [0, T ] are denoted by S 1 t and S 2 t , respectively.There are 6 time instances, say d 1 , . . ., d 6 , when the option expires early.In other words, on day d i , the option expires and generates a payoff of $r i if both S 1 d i /S 1 0 and S 2 d i /S 2 0 are above a certain price b i for 1 ≤ i ≤ 6.If the option does not expire before time T = 365, the option expires at time T = 365 as follows.If the option does not expire before time T = 365, and both S 1 t /S 2 0 and S 2 t /S 2 0 stay above the price b over the entire lifetime t ∈ [0, T ] of the option, then the option expires at time T = 365 and generates a payoff of $1.If the option does not expire before time T = 365, and either S 1 t /S 2 0 or S 2 t /S 2 0 drops below b at any time instance t between 0 and T = 365, the option expires at time T = 365 and generates a payoff equal to the minimum of S 1 T /S 2 0 and S 2 T /S 2 0 .
We let {t 1 , . . ., We assume that the current time is 60 days before T = 365.For each t i = (x 1 (i), x 2 (i)), we generate two independent sample paths of a geometric Brownian motion as the trajectories of the prices of Stock 1 and Stock 2 between the current time and T = 365 with the current stock prices of Stock 1 and Stock 2 equal to x 1 (i) and x 2 (i), respectively.We then compute the corresponding payoff of the stock option.The following parameters are used to compute the payoff of the stock option: We set r = 5, so 5 iid replications Y i1 , . . ., Y i5 are generated at each (x 1 (i), x 2 (i)) and The second derivative, often called the "gamma" of an option price is of particular interest to portfolio managers because it determines how many shares of the option must be sold or bought to stay in the delta-hedged position.Moreover, it is often assumed that the gamma is a smooth function of underlying stock prices because a smooth gamma curve suggests consistent buying or selling strategies over a domain of possible stock price.Hence, it is natural for us to assume that f * has the smooth derivative.Since we can compute the smoothness of the second derivative of f * using J d m with m = 4, our proposed estimator is computed as for 1 ≤ i ≤ n and (x 1 , x 2 ) ∈ [0.5, 1.5] 2 , and (â 1 , . . ., ân+10 ) is calculated from the following program: Problem ( 7) is solved with CVX.The double integration appearing in the objective function Problem ( 7) is computed numerically in MATLAB.
The MSE of the proposed estimator fn is calculated by and is compared to the MSE of the crude estimator Y 1 , . . ., Y n We calculate the 95% confidence intervals of the MSE using 400 iid replications, and report them in Table 3.We also measure the amounts of time spent while calculating the proposed estimator, and compute their averages using 400 iid replications.We report these values in Table 3.We observe that the MSEs of the proposed estimator are less than those of the crude estimator.The MSE of the proposed estimator converges to 0 as n increases.We can thus deduce that the proposed estimator converges to the true function as n increases to infinity.