Testing Bivariate Normality Based on Nonlinear Canonical Analysis

Using a test statistic constructed on wavelets-based estimation of canonical coefficients of nonlinear canonical analysis, we introduce a new class for bivariate normality test. The limit distribution of the new test statistic is established. We also give some critical values of the distribution. The finite sample performance of the proposed test, with comparison to that of an existing method, is evaluated through Monte Carlo power study.


Introduction
Multivariate statistical methods for data analysis often require the assumption of normality of the underlying population.A severe departure from normality could result in unreliable statistical conclusions for models based on the normality assumption.In the univariate case, a departure from normality can usually be attributed to the skewness or kurtosis of the data being analyzed.In the multivariate case, the situation becomes much more complicated : a departure from multivariate normality could come from any direction in a multidimensional Euclidean space.Because of this fact, an existing statistic for testing multinormality can only provide partial information on the assumption.No statistic can outperform others in all aspects.Therefore, to get a relatively complete understanding of the nature and extent of violations of normality assumption with multivariate data, it is wise to use several testing statistics at the same time.This is why interest in developing normal test statistics has been continuing (see, e.g., Mardia, 1980;Csörgo, 1986;Zhu et al., 1995;Yang et al., 1996;Henze & Wagner, 1997;Liang et al., 2000;Kim & Bickel, 2003;Von Eye & Bogat, 2004;Székely & Rizzo, 2005).In this paper, we propose a new class of test for bivariate normality based on wavelets estimation of canonical coefficients of nonlinear canonical analysis (NLCA) of two random variables.The paper is organized as follows.The next section sets up the notation and some preliminary results.Section 3 is devoted to the construction of the new test statistic.In Section 4, we give the limit distribution of the test statistic defined in Section 3. Section 5 presents some power comparisons to others statistics.

Notations and Preliminary Results
We consider a probability space (Ω, A, P) and denote by L 2 (P) the Hilbert space of random variables with finite second-order moment.Let X and Y be random variables defined on (Ω, A, P), with values in measurable spaces (E X , T X ) et (E Y , T Y ) respectively, and with probability distribution measures denoted by P X and P Y .We denote by L 2 (P X ) the space of measurable real functions ϕ defined on E X and such that E(ϕ 2 (X)) < +∞, and by L 2 (P Y ) the analogous of L 2 (P X ) with respect to Y. Nonlinear canonical analysis (NLCA) of X and Y is defined by Dauxois and Pousse (1975) as the search of orthonormal bases (ϕ i ) i≥1 and (ψ i ) i≥1 of L 2 (P X ) and L 2 (P Y ) respectively, satisfying: where E denotes the mathematical expectation, and and for i ≥ 2: (1.4)

Nonlinear Canonical Analysis (NLCA) of Random Variables
Considering the subspaces of L 2 (P), it is known (see Dauxois & Pousse, 1975) that the solution for the NLCA problem is obtained from the spectral analysis of the self-adjoint operator , that is the restriction of T = E X E Y at H X , where E X and E Y are the conditional expectations relative to X and Y, respectively.If T is a compact operator, NLCA exists and is characterized by a triple: where N, N 1 , N 2 are elements of N ∪ {+∞}.In this triple, the ρ i 's, called canonical coefficients, are real numbers contained in ]0, 1], the systems ).The sequence of canonical coefficients is non increasing and unique and, when T is a compact operator, one has lim i→+∞ ρ i = 0.In this paper, we suppose that T is compact and that the aforementioned sequence is strictly decreasing, that is: ρ i > ρ i+1 for any i ≥ 1.These hypotheses are satisfied when (X, Y) has a bivariate normal distribution (see Dauxois & Pousse, 1975) but also for other families of bivariate distributions (see Buja, 1990).When (X, Y) has the bivariate standard normal distribution with correlation ρ, then (see, e.g., Dauxois & Pousse, 1975) the NLCA of X and Y is given by where H i are the Hermite polynomials.

Estimation of Nonlinear Canonical Analysis
Let {(X i , Y i )} 1≤i≤n be an i.i.d.sample of size n, where each pair (X i , Y i ) has the same distribution as (X, Y).The aim of this section is to remind the principle of estimation by wavelets of NLCA.Let {V (1) j } j∈Z and {V (2) j } j∈Z be two multiresolution analysis(see, e.g., Meyer (1990) for a definition) with fathers wavelets respective φ 1 and φ 2 .Given a nondecreasing sequence ( j n ) n∈N in Z such that lim n→+∞ j n = +∞, we consider the estimator fn of f defined (in Vidakovic (1999) for example) by where, with, for all ( , j) ∈ {1, 2} × Z and all (x, y) ∈ R 2 We consider the estimation by wavelets of NLCA defined in Niang et al. (2012) by the family ρ and for i ≥ 2: Remark 1 For practical computation of the above introduced estimators, see Niang et al. (2012).Under some conditions, asymptotic properties for these estimators are established by Niang et al. (2012).Let us word in the following lemma the result that allows to obtain a limiting distribution for ρ(n) i .Note that we consider, without loss of generality, the squared canonical coefficients λ i = ρ 2 i and their estimators λ(n Lemma 1 For all i ≥ 1, we have the convergence in distribution, as n → +∞, of the random variable √ n λ(n) i − λ i to a random variable with normal distribution N(0, σ 2 i ), where We can find the proof of Lemma 1 in Niang et al. (2012).This lemma will be useful for establishing the asymptotic normality for our proposal test statistic.

Constructing the Test Statistic
When (X, Y) have a standard normal distribution with correlation ρ and a null expectation, then Thus, for all m ∈ N * , putting Φ (m) = m i=1 λ i , one can test the fact that (X, Y) follows the normal distribution described above considering the null hypothesis test versus the alternative hypothesis In order to do that, wa can take as test statistic the random variable , where λ(n) i are the wavelets estimators of λ i = ρ 2i (see section 2.2).We are now going to describe the asymptotic properties of Φ (m)  n .

Limiting Distribution of the Test Statistic
Under H 0 , the limiting distribution of the previously defined test statistic, is given in the following theorem.
Theorem 1 Under hypothesis H 0 , the random variable m) converges in distribution, as n → +∞, to a random variable with normal distribution N(0, σ(m) 2 ), where σ(m) 2 = Var(g m (X, Y)), with g m (x, y) = m i=1 Proof.This result is a consequence of Lemma 1.In fact, we have where R n,i .The relation (2.1) allows to write g m (x, y)( fn (x, y) − f (x, y))dxdy converges in distribution, as n → +∞, to a random variable with normal distribution N(0, σ(m) 2 ), where σ(m) 2 = Var(g m (X, Y)), this yields the proof.

Simulations
In this section, we illustrate the previous procedure for testing bivariate normality by applying it to various data sets.In order to assess performance on finite samples, the procedure is applied to simulated data from bivariate random variables (X, Y) with known distributions.The objective is to estimate the powers of some tests of our class and to compare these powers to those of the below three affine invariant tests for bivariate normality.

Mardia's Multivariate Kurtosis and Skewness Test
Mardia (1980) proposed a test of multivariate normality based on skewness and kurtosis.The multivariate skewness test proposed by Mardia (MARD) is based on the sample skewness statistic defined where Σ = n −1 n j=1 X j − X X j − X t denotes the maximum likelihood estimator of population covariance and A t is the transpose of A. Normality is rejected for large values of m 1,d .

Test of Malkovich and Afifi
The test of normality proposed by Malkovich and Afifi (1973) is a generalization of an univariate Shapiro-Wilk's test.For comparison, we also put the power of Malkovich and Afifi's (MA) generalized Shapiro-Wilk's W statistic.The Shapiro-Wilk's W statistic for testing univariate normality is where Z ( j) 's are the univariate order statistics of Z 1 , • • • , Z n , Z = n −1 Z j , and a j 's are the coefficients tabulated in Shapiro and Wilk (1965).The test of Malkovich and Afifi accepts the hypothesis of multivariate normality if where k ω is a constant.

Test of Székely and Rizzo
Recently Székely and Rizzo (2005) where X and X are independent and identically distributed with the distribution F 0 .If the hypothesized distribution is d-variate normal with mean vector μ and nonsingular covariance matrix Σ, denoted N d (μ, Σ), consider the transformed sample where Z and Z denote iid N d (0, I) random variables, and I is the d × d identity matrix.A test of the simple hypothesis d-variate normality, d ≥ 1, rejects the null hypothesis for large values of E n,d .
In the following section, we consider the above three test statistics in the bivariate case.

Simulation Results
A Monte Carlo experiment was performed to study the power of the test based on Φ (m) n .The critical values are given in Table 1 and Table 2 for sample sizes n = 20, 30, 50, 100, the usual significance levels α = 0.01, 0.05 and 0.10 and for correlation ρ = 0.5, ρ = 0.8.Each empirical percentage is based on 1000 realizations of Φ (m)  n .
) denote the standard normal, uniform and exponential distributions; χ 2 k is the Chi-square distribution with k degrees of freedom; Γ(a, b) is the Gamma distribution with density b −a Γ(a) −1 x a−1 exp( −x b ), x > 0; B(a, b) stands for the beta distribution with density 1 (SR) proposed a test of multivariate normality based on Euclidean distance between sample elements.Let X 1 , • • • , X n is a random sample from a d-variate population with distribution F, and x 1 , • • • , x n are the observed values of the random sample.The statistic test proposed by Székely and Rizzo for testing H 0 :

Table 3 .
Percentage of reject of bivariate normality hypothesis by the test based on Φ (m) n , MA, MARD and S R, with α = 0.05, based on sample size n and 1000 replications