Asymptotic Distribution of Cramér-von Mises Statistic When Contamination Exists

In this paper, asymptotic distribution of Cramér-von Mises goodness-of-fit test statistic is investigated when contamination exists. We first derive the asymptotic distribution of the Cramér-von Mises statistic when the observations are contaminated with noise as a mixture. The result is extended to the case where the parameters are estimated by the minimum distance estimator, which minimizes the Cramér-von Mises statistic. In both cases the asymptotic distribution of the Cramér-von Mises statistic is given by that of the weighted infinite sum of non-central χ1 variables and the effect of contamination appears only in the non-centrality of the variables. We also demonstrate the robustness of the goodnessof-fit test by Monte Carlo simulations when the parameters are estimated by the minimum distance estimator and the maximum likelihood estimator. Numerical experiments indicate that the use of the minimum distance estimator makes the test insensitive to contamination whereas the power is retained almost the same as that of the maximum likelihood estimator.


Introduction
Let X 1 , X 2 , . . ., X n be independent and identically distributed observations, and X ( j) , j = 1, 2, . . ., n be their order statistics.In this paper, we assume that the distribution is contaminated as where F(x, θ) is a continuous distribution with the parameter θ = (θ 1 , θ 2 , . . ., θ m ) ⊤ ∈ Θ ⊂ R m , and G(x) is the distribution of the contamination, where the rate of contamination is ε ≥ 0. We hereafter assume that both distributions F(x, θ) and G(x) have bounded and smooth densities f (x, θ) and g(x), respectively.We are interested in the behavior of the goodness-of-fit test statistic, The asymptotic behavior of W 2 n (θ) when no contamination exists has been thoroughly investigated, including the case when an estimation of the parameter θ is involved (for example, Shorack & Wellner, 1986).However, only a few works have been performed for cases where contamination exists.We first derive the asymptotic distribution of W 2 n (θ) in a general frame work via an elementary matrix calculation.We next employ the minimum distance estimator θ which minimizes W 2 n (θ) as an estimate of θ.There are two reasons why we are interested in the use of the minimum distance estimator.The first is due to the fact that the estimator shares the same loss function with W 2 n (θ).It therefore seems natural to employ the same loss function for both parameter estimation and goodness-of-fit tests.The other reason is that θ is robust as an estimator, as shown in Millar (1981), for example.Woodward, Parr, Schucany & Lindsey (1984) demonstrated via numerical experiments that the minimum distance estimator is better than the maximum likelihood estimator under symmetric departures from normality for each component in normal mixture models.Since then, the minimum distance estimator is often used in practice for mixture models (Beutner & Bordes, 2011;Garcia-Dorado & Marin, 1998).
The rest of this paper is organized as follows.In Section 2 the asymptotic distribution of W 2 n (θ) is derived.The effect of contamination appears only in the non-centrality of the weighted infinite sum of non-central χ 2 1 variables.In Section 3 the result is extended to the case where the parameters are estimated by the minimum distance estimator.The result is useful not only for knowing the effect of contamination to the asymptotic distribution, but also for obtaining the weights and non-centralities of χ 2 1 variables of finite sample sizes.The method described here is much simpler than solving the integral equation, as in the case of no contamination.The robustness of the test when the minimum distance estimator is used is demonstrated by computer simulations in Section 4. All proofs are given in the Appendix.

Asymptotic Distribution When the Parameters Are Known
We rewrite W 2 n (θ) as W 2 n (θ) = ∥(n + 1)S n U n ∥ 2 by introducing an n × (n + 1) matrix ) and an n + 1 dimensional vector Here we define F ( X (0) , θ ) = 0 and F ( X (n+1) , θ ) = 1 for convenience.We also define a diagonal matrix B with diagonal elements, b 1 = b 2 , and where f ε (x, θ) and F −1 ε (u, θ) = x are the density function and the inverse function of F ε (x, θ), respectively.Taylor expansion yields an approximation of U n as where 1 and all others are equal to 1.We now see that it is enough to know the distribution of ∑ n j=1 λ n j (V n j + µ n j ) 2 in place of W 2 n (θ), where ) and Here Λ n is a diagonal matrix of eigenvalues λ n1 ≥ λ n2 ≥ . . .≥ λ nn , and P n is an orthogonal matrix of eigenvectors p (n) j , j = 1, 2, . .., of S n B 2 S ⊤ n .Proposition 1.For any fixed j > 0, as n tends to infinity λ n j converges to λ j = 1/( jπ) 2 and √ np (n)  ⌈nu⌉ j converges to f j (u) = √ 2 sin(π ju) for 0 < u < 1, which are the eigenvalues and the eigenfunctions of the integral equation k j is the kth element of p (n) j and ⌈x⌉ is the minimum integer, which is greater than, or equal to, x.
Note that the eigenvalues and the eigenvectors become independent of the contamination in this limit.
Proposition 2. Any finite-dimensional random vector ) ⊤ converges in distribution to a normally distributed random vector with mean 0 and variance I p as n tends to infinity.
Combining these results, we see that . ., V p are given in the proof of Proposition 2. On the other hand, ∑ n j=1 λ n j µ 2 n j converges to ∑ ∞ j=1 λ j µ 2 j because of the boundedness of ∑ n j=1 λ n j µ 2 n j .We therefore have the following Theorem 1.
Theorem 1.The distribution of W 2 n (θ) converges in distribution to that of . ., are independent and identically distributed random variables with a standard normal distribution and We see that contamination is only effective for µ j 's with the proportion ε.Theorem 1 reduces to the well-known result when no contamination exists, which is derived as an application of the theory of empirical processes.A closely related result to Theorem 1 is given by Guttorp & Lockhart (1988).They developed a general theory for an asymptotic distribution of quadratic forms of order statistics from a uniform distribution under contiguous alternatives, where densities are of the form 1 + δη(u)/n 1 2 under the condition ∫ 1 0 η(u) 2 du = 1.For η(F(x, θ)) = g(x)/ f (x, θ) − 1, the same result as in Theorem 1 can be derived from their theory.However a major difference is that Theorem 1 is free from the constraint ∫ 1 0 η(u) 2 du < ∞.We have used only the fact that f (x, θ) and g(x) are probability density functions for the proof.

Asymptotic Distribution
We need the following assumptions in order to derive the asymptotic distribution of W 2 n ( θ) when the parameters are estimated by the minimum distance estimator θ.
) is of full rank.
Without loss of generality, we may assume that Z n (θ) is of full rank for each n, in view of the third assumption in Assumption 2.
Since the minimum distance estimator θ is the solution of It is further approximated as If we note that W 2 n ( θ) is asymptotically equivalent to we see that it is enough to derive the asymptotic distribution of where Proposition 3.For any fixed j > 0, λn j (θ) converges to λ j and q (n) j converges to q j , where λ j and q j = (q 1 j , q 2 j , . ..) ⊤ are the jth eigenvalue and eigenvector of the following infinite-dimensional matrix.) with a mean 0 and variance I p as n tends to infinity.
If we note that ∑ n−m j=1 λn j (θ) μ2 n j (θ) ≤ ∑ n j=1 λ n j µ 2 n j , we have the following theorem, which is similar to that seen in Theorem 1.
Theorem 2. The distribution of W 2 n ( θ) converges to that of ∑ ∞ j=1 λ j (Y j + μ j ) 2 in distribution as n tends to infinity, where Y 1 , Y 2 , . . ., are independent and identically random variables distributed as a standard normal distribution, and where the µ l terms are the same as those in Theorem 1.
In the case where no estimation is involved, the asymptotic distribution derived in Theorem 2 is consistent with the classical result that is derived from the theory of a Brownian bridge (Shorack & Wellner, 1986).In fact, the eigenvalues λ j s are consistent with those of the integral equation with the kernel function However, it is well known that it is not trivial to solve such an integral equation accurately with any reasonable cost of computation.The result above gives us an alternative and simplified way to perform the calculation.

The Distribution for Finite n
To evaluate the distribution of W 2 n (θ) or W 2 n ( θ) for finite n, the proofs in the Appendix suggest a good way of performing the calculation.In the case of Theorem 1, finding the eigenvalues and the eigenvectors of S n B 2 S ⊤ n , then ∑ p j=1 λ n j (Y j + µ n j ) 2 gives us a good approximation of W 2 n (θ) for an appropriate choice of p < n.In the case of Theorem 2, finding the eigenvalues and eigenvectors of D n ( θ), then ∑ n−m j=1 λn j ( θ)(Y j + μn j ( θ)) 2 gives us a good approximation of W 2 n ( θ) for an appropriate value of p < n−m.The replacement of θ by θ in the calculation is justified by the consistency of θ irrespective of the existence of contamination.In fact, converges to 0 in probability as n tends to infinity.It follows from the strong consistency of θ and Lemma A 1 in the Appendix.

Investigation of Robustness by Simulations
It is still unclear how advantageous the use of the minimum distance estimator is over other estimators, for example, the maximum likelihood estimator.In this section, via Monte Carlo simulations we demonstrate that the power curves remain almost identical when no contamination exits, but the rejection probability is significantly affected by the existence of contamination when the maximum likelihood estimator is used.
The sample size here is fixed at 200, and the significance level is 0.1.The solid line in each panel is for the case of the minimum distance estimator and the dotted line is for the case of the maximum likelihood estimator.
The power curves given in Figure 1 suggest that there is no significant difference between the minimum distance and the maximum likelihood estimator when no contamination exists.The left panel shows the power curves for Exponential against Gamma with the parameter ν, and the right panel shows them for Normal against Student's t with the parameter ν = 1/d, where d is the number of degrees of freedom.
On the other hand, if contamination exists, the rejection probability quickly increases for the maximum likelihood estimator, but it does not increase for the minimum distance estimator.Two examples are shown in Figure 2. The results indicate that the use of the minimum distance estimator makes the goodness-of-fit test robust to contamination.

Concluding Remarks
The asymptotic distributions of the Cramér-von Mises statistic are derived when the observations are contaminated.Numerical experiments indicate that the use of the minimum distance estimator makes the test less sensitive to contamination, although the power stays almost the same as that of the maximum likelihood estimator.Such insensitivity would be harmful when the aim of the test is to detect the existence of contamination.However, it becomes advantageous if the aim of the test is to check whether the underlying distributional model is usable or not.It often happens in practice that the hypothesis testing is not a goal but a beginning of the analysis.Then, a goodness-of-fit test statistic which is insensitive to small number of contaminations is preferred.
We leave any other problems for further investigations, including a more explicit analysis of sensitivity to the contamination and extension of the theorems for other types of estimators.Also extension to multivariate case would be interesting, although the distribution of the statistic becomes much more complicated even when the parameters are known (Rosenblatt, 1952).
Normal against Student's t test.