Estimation of Causal Functional Linear Regression Models

where supp β ⊂ T, and T is the closed triangular region whose vertexes are (a, a), (b, a) and (b, b).We assume we have an independent sample {(Yk,Xk) : 1 ≤ k ≤ N, k ∈ N} of observations where the Xk’s are functional covariates, the Yk’s are time order preserving functional responses and Ek, 1 ≤ k ≤ N, is i.i.d. zero mean functional noise.


Introduction
Functional linear models are generalizations of linear models to functional data.There are three basic functional linear models according to the functional data being the response, the covariates or both.They are written: the functional response model with scalar covariate, the scalar response model with functional covariate, and the functional response model with functional covariate which is also called the fully functional model.They, as well as their generalizations, have a wide range of applications.See all the listed references in this note.
Concerning the fully functional model, with the additional assumption that Y and X are both functions of time and are both defined on the same time interval, [a, b], then, as already remarked in Functional Data Analysis literature, either we admit that the values of the covariates at future times impact the values of the response at past and present times or we admit that the support of β is contained in the closed triangular region T whose vertexes are (a, a), (b, a) and (b, b).
The later alternative is equivalent to the "Volterra type" model and to the "Fredholm type" model where β is a function defined on a domain that contains T and χ T is its indicator function.We will call this model Causal Functional Linear Model.
Model (2) may be directly estimated with expansions in tensor product basis.This is possible by expanding the function β, defined on the square [a, b] 2 by β(x, y) = β(x, y) if (x, y) ∈ T and β(x, y) = 0 if (x, y) ∈ [a, b] 2 \ T, in a series of tensor products and using equation ( 2).However, we expect the convergence to be slow and also the existence of Gibbs phenomena at the border of the triangle, more precisely, at all points (u, v) of the line segment joining (a, a) to (b, b) for which there exists ϵ > 0, which may depend on (u, v), such that for every r > 0 the Lebesgue measure of the set {(x, y) ∈ B ((u, v), r) : | (βχ T ) (x, y)| > ϵ} is strictly positive.The issue here is that causality imposes the slope function to be identically zero on the interior of the upper triangular region with vertexes on points (a, a), (b, b) and (a, b).
It is possible to obtain estimators of causal functional linear models via finite elements methods as done in (Malfait & Ramsay, 2003) for the case of historical functional linear models.
The aim of this note is to show that we still can obtain good estimates using expansions in tensor products.This is made possible by the construction of a symmetric extension of the slope function and the use of an algebraic trick.No additional Gibbs phenomena is expected and convergence is faster than that associated to the direct expansion method, since now it will be similar to that associated to the application of the direct tensor product expansion method to estimate the slope function of a "standard" functional linear model, i.e. , one whose slope function is defined on the whole square [a, b] 2 and does not present an abrupt behavior change between the lower and upper triangular regions.We remind that the direct estimation of the slope function using tensor product expansions for causal functional linear models was unsuitable because of the possible, very probable indeed, existence of a sharp discontinuity of the slope function at the hypotenuse of the triangular region T.

Estimator Construction
Now, define B as the symmetric extension of The expansion of B is written From the causal model definition we have and, multiplying by X(s), we obtain Integrating this equation for s ∈ [a, b] gives The following simple fact will permit us to continue the calculations and make the desired tensor expansions: Equation ( 8) now is written Now, writing Analogously, we have and, expanding Substituting ( 11), ( 12), (13), and ( 14) in (10) we get Note that, from (15) we have Equivalently, if X 0, we write Let us write the sample {Y k , X k }, and noise E k as and Now, using (17) for each k and summing over k for 1 ≤ k ≤ N, we get Note that inequality (18) must be fulfilled whatever the noise energy is.We do not know how high this energy is.Thus, to guarantee its fulfillment, we will minimize its left hand side.This is similar to minimizing the energy of the residuals when fitting a regression model.This leads us to seek the minimum of with respect to a i and b i j , where = 0 we obtain the following system of equations: For all l ∈ I, and, for all (l, m) every solution of the simultaneous system of equations ( 20) and ( 21) is a point of local minimum of F .Moreover, since ( 22) is true for every ( ), every solution of this system of equations is also a global minimum of F .Now, observe that I is an enumerable infinite set and both ( 20) and ( 21) have infinitely many equations and unknowns.In general, for every finite sample size N, the solutions to ( 20) and ( 21) will yield estimators for α and B that perfectly fit the data.See (Ramsay & Silverman, 2005).In practice, we will consider a finite subset F of I to perform the expansions of the covariates, the responses and the parameters.This will lead to the finite version of equations ( 20) and ( 21) from which we will obtain our estimators âi and bi j of a i and b i j .Observe that, in general, these estimators depend on the choice of F ⊂ I.They are based on the approximate relation which comes from (15).
The estimators of the parameters α and B will be given by where âi and bi j are the solutions of the finite system of equations: and for all (l, m) Observe that this is a linear system in âi and bi j .If the cardinality of F is p then this is a system of p 2 + p equations on p 2 + p unknowns.
Finally, the estimated model will be written or, equivalently,

Final Remarks
The linearity of systems ( 26) and ( 27) is of great importance for computational reasons.
The choice of the basis is an important practical issue.As a general guideline, the choice of Fourier sine cosine basis is recommended when we have non localized frequency behavior of covariates and responses, and wavelet basis shall be chosen for the localized frequency covariates and responses.The case where we have opposite localized frequency behaviors of covariates and responses is probably the most difficult to deal with.
Choosing a convenient finite linearly independent orthonormal set {ϕ i : i ∈ F ⊂ I} to expand the data and the functional parameters will furnish, for suitable bases, smooth estimates of the parameters for the Causal Linear Functional Model.This is a kind of regularization that is obtained by the simple reduction the number of basis functions used in the estimation process.See (Ramsay & Silverman, 2005).Other methods for regularization are available, such as the use of penalties given by some functional applied to the parameters, and the use of threshold techniques on the set of estimated coefficients