Polynomial Interpolation of Normal Conditional Expectation

In this paper we are interested in approximating the conditional expectation of a given random variable X with respect to the standard normal distribution N(0, 1). Actually we have shown that the conditional expectation E(X|Z) could be interpolated by an N degree polynomial function of Z, φN(Z) where N is the number of observations recorded for the conditional expectation E(X|Z = z). A pointwise error estimation has been proved under reasonable condition on the random variable X.


Introduction
Since the outstanding Stone-Weierstrass result about the approximation of continuous functions by polynomials on compact sets, (Rudin, 1976) and (Prolla, 1993), polynomial interpolation has become the corner stone of numerical analysis and approximation theory.Although the importance of polynomial interpolation it has not been used to interpolate stochastic processes or random variables.This can be understood for the case of one random variable because it sounds very strange to interpolate one value with a polynomial function!Actually this could be done if we consider the conditional expectation of a given random variable X with respect to another known one Z.In fact this is because this latter can be expressed as E(X|Z) = φ(Z) where φ is L 2 with respect to the probability distribution dµ Z of Z when X is of finite variance, the function φ is called regression function.
This starts to make sense for many reasons, the first one, is due to the fact that we often, especially in practical situations, cannot observe the random variable X unless we fix a value for another known variable, in our case, Z.The second reason is because that the conditional expectation E(X|Z) is the best approximation of X as a function of Z.I even can add a third reason, that is due to the result in (Krzysztof K., 2005), where it has been proved, that any random variable X can be approximated by the conditional expectation of another random variable E(Y|U) where U is a fixed sub-σ-algebra.
In the present paper our main interest is to interpolate the conditional expectation random variable E(X|Z) with a polynomial function of Z, where X is supposed to have finite variance and Z is a random variable that follows the standard normal distribution N(0, 1).The Central Limit Theorem (CLT) which tells us that every " thing " is approximately normal in " average ", see (Billingseley, 1995), justifies our particular choice of the normal distribution for the variable Z.Note here that both variables X and Z are considered on the same probability space (Ω, F , P).In (Lando T. & Ortobelli S., 2015) a similar interest has been noted where authors have given an approximation of the density function of the conditional expectation E(X|Z) has been studied in more general context where Z is not necessary normally distributed.While in our current work we are giving an approximation to the regression function itself φ(Z) = E(X|Z).
It is known that the conditional expectation E(X|Z) of X given Z is expressed as a function of Z, φ(Z), for some φ ∈ L 2 (R, µ), where µ is the standard Gaussian probability measure For more details about conditional expectation properties we can consult the following references (Billengsley, 1995) and (Sheldon M. R., 2007) for instance.
Recall that the family of Hermite polynomials {h n (z) : n ≥ 0} is defined by its generating function: and that these polynomials satisfy the following orthogonality relation where δ nm = 1 if n = m and 0 if n m.So the polynomials {h n : n ∈ N} form an orthogonal basis of the space L 2 (R, µ).
It follows that every function φ ∈ L 2 (R, µ) can be expressed, in L 2 (R, µ), as This latter relation implies in particular that where the convergence holds in L 2 (Ω, P).Let N ∈ N and denote by φ N the truncated sum It is clear that the sequence φ N -of polynomials of degree N -approximates φ in L 2 (R, µ) and so φ N (Z) approximates E(X|Z) in L 2 (Ω, P).
This leads to the following questions: 1.Under which conditions on X does the function φ admit a continuous version?
3. How can one use this latter convergence result to interpolate polynomially φ and so E(X|Z)?
4. Could the latter approximation be considered to simulate the variable E(X|Z)?
Actually the main purpose of this paper is to answer the previous questions.The remaining parts of the paper are organized as follows: In the next section we have stated and proved the main result of this paper, see Theorem 2.4.We have shown the convergence of the polynomial sequence φ N pointwisely and uniformly to the function E(X|Z = •) = φ(•), which leads to the existence of a continuous version for the function φ, and so we have answered to questions 1-3 at the same time.The last section is devoted to the numerical illustration of our result where we have tested the interpolation algorithm and used it to simulate the conditional expectation E(X|Z).

Existence of a Continuous Version
In this section we are going to give answer to the first question asked in our introduction, that is " Under which conditions on X the function φ admits a continuous version as a function of L 2 (R, µ)? ".For this we shall construct a sequence of decreasing Hilbert subspaces of H 0 = L 2 (R, µ).Let q be a non-negative integer.Define the Hilbert subspace of H 0 , H q by the completion of the subspace: More details about the latter sequence of subspaces (7) can be found in (Rezgui, 2017).
i. Actually, the choice of the weight 2 q in the definition of the sequence of subspaces in ( 7) could be replaced by any positive sequence α q that increases to infinity with q, this is to make the sequence of Hilbert subspaces decreasing.
ii.The sequence of subspaces in ( 7) is strictly decreasing in the sense that: Before going to the main point of this section we prove a lemma, which gives a control of the L 2 approximation of the function φ by the polynomial sequence {φ N } N≥0 given by the truncated sum in (6): Lemma 2.2.Let φ ∈ H 0 = L 2 (R, µ) and p ≥ 1 be such that φ ∈ H p H 0 .Then for any 0 ≤ q < p and for all N ∈ N the truncated sum φ N in ( 6) satisfies: Before proving this lemma, we would like to state the following: i. Actually it is known that φ N converges to φ in H 0 but we couldn't derive any general result about the convergence speed unless using a similar construction of sequence of subspaces of H 0 .For instance if p = 1 so q = 0 and we get in particular ii.If we consider φ such that φ(•) = E(X|Z = •) we get, instead of ( 9) Proof of Lemma 2.2: Now we are ready to announce the main result of this work.In fact we shall show that the conditional function E(X|Z = z) = φ(z) can be approximated pointwisely on R and uniformly on each compact subset of R by a polynomial function, and thus admits a continuous version.Actually by answering to question 1. of the last section Theorem 2.4 answers also to questions 2. and 3..Theorem 2.4.Let X be a given random variable with a finite variance and Z be a random variable that follows the standard normal distribution N(0, 1).Suppose that the conditional expectation function φ(•) = E(X|Z = •) belongs to the Hilbert subspace H 2 .Then the sequence of polynomial random variables { φ N (Z) } N≥1 converges absolutely almost everywhere to the conditional expectation random variable E(X|Z).Moreover for any compact subset K ⊂ R there exists a positive constant C > 0 such that where Z K = Z1 {Z∈K} and 1 {Z∈K} is the indicator function of the subset of Ω, {Z ∈ K}.

Proof of Theorem 2.4:
It is known that Hermite polynomials satisfy the following integral representation, see (Szegö, 1939): This implies in particular that for any n ∈ N and z ∈ R ] .
Using inequality (13) and the Stirling formula ) 1/4 1 n 1/4 for some constant C > 1 and then where C q > 0 is a constant that depends on q ≥ 1.
The latter implies that if φ is chosen to belong to H q then it can be approximated pointwisely and uniformly on each compact subset of R by the polynomial sequence {φ N } N≥0 .In fact which by using inequality (8) of Lemma 2.2 for p = q + 1 gives Since the latter inequality is true for every q ≥ 1 if we choose q = 1 we get the pointwise convergence of (φ N ) N to φ unless φ ∈ H 2 .We complete the proof by replacing φ by E(X|Z = •). 2

Numerical Illustration
In this section we shall use the approximation result of Theorem 2.4 to set up an algorithm of recovering the conditional expectation E(X|Z) = φ(Z) from the knowledge of a finite number of sample conditional values.
Suppose that for each observation of Z, Z = z i , i = 0, • • • , N, we could observe X and obtain the design To recover the conditional expectation E(X|Z) knowing this finite number of conditional observations {x 0 , • • • , x N } we shall interpolate the function φ knowing the design {(z 0 , x 0 ), • • • , (z N , x N )}.By Theorem 2.4, the function φ can be pointwisely approximated by the sequence of polynomial functions {φ N }, this allows us to consider that the polynomial This leads to the following linear system By solving the linear system above one obtains the coefficients of the interpolation polynomial: The weakness of the current algorithm is the absence of guarantee that the latter system is solvable since we just know that Hermite polynomials are linearly independent as functions in L 2 (R, µ) but not as vectors in R N+1 , despite of this, the following remark saves the situation: Remark 3.1.Consider the polynomial function of several variables: is everywhere dense, which means that the matrix in ( 15) is invertible most of times.
We would like to note here some differences between our algorithm and standard polynomial interpolation ones: i. Accuracy, in our case, doesn't depend on the distances between the nodes of the initial design: however in most standard polynomial interpolation algorithms, it does, see for instance (Phillips, 2003): where p N denotes the interpolation polynomials in the standard cases.ii.In most standard polynomial interpolation algorithms accuracy depends strongly on interpolated function's regularity, while, in our case it doesn't.Actually, we just need to suppose that φ belongs to H 2 .

Interpolation Test
Let us consider the function f (x) = sin x on [−π, π], for each N = 5, 10 and 15 we shall consider N equidistant points and form three designs for the function f .Then we interpolate using our algorithm and the predefined one in Matlab "spline" -it is the most accurate-below are plots where we can see the nodes N = 5, the sine function in discontinuous line and the two polynomial interpolations in continuous line: ii.Actually we have tested the difference between our algorithm and the spline one for different kinds of functions and we have noted that our algorithm gives better approximation's error for polynomial functions, trigonometric and exponential functions and also for fractional functions such as, f (x) = x 1 + x .
iii.However, if we consider fractional functions that decrease to zero when x goes to infinity, for instance, f (x) = 1 1 + x 4 the spline algorithm gives clearly better approximation's error.iv.We would like to note that although we have tried our algorithm several times, we haven't had ever a singularity for the matrix in (15), this makes us in doubt about its singularity in general!

Simulation Test
To test the simulation aspect of our algorithm we shall test whether we could simulate E(X|Z) by using φ N (Z) knowing X for only a few numbers N ∈ N of values of Z.For this we have considered the correlated normal couple (X, Z) with zero mean and covariance matrix It is easy to check that the conditional expectation worths and so the function φ(z) = z 2 .
Our test consists first of generating n > N values of the normal couple (X, Z) and then select N = 5, 10 values among them.Then we interpolate the function φ based on those selected N values, which forms our design, and we calculate the interpolation function on the all n initially generated values of Z. Below we display two graphical tests of normality for the obtained polynomials φ N (Z): Although the previous graphical tests seems to be pretty good I have to say that I didn't get the same satisfaction with standard numerical normality tests, such as, the Kolmogorov-Smirnov test.
It is clear that our algorithm is more accurate than the spline one for the case where f (x) = sin x.
Figure 1.Hermite vs Spline, f (x) = sin x In the next table we have recorded errors of the two algorithms for different values of N = 5, 10 and 15: