Use of Hotelling ’ s T 2 : Outlier Diagnostics in Mixtures

Given Gaussian observation vectors [Y1, . . . ,Yn] having a common mean and dispersion matrix, a pervading issue is to identify shifted observations of type {Yi → Yi +δi}. Conventional usage enjoins Hotelling’s Ti diagnostics, derived and applied under the mutual independence of [Y1, . . . ,Yn]. Independence often fails, yet the need to identify outliers nonetheless persists. Accordingly, the present study reexamines Ti under dependencies to include equicorrelations and more general matrices. Such dependencies are found in the analysis of calibrated vector measurements and elsewhere. In addition, mixtures of these distributions having star–shaped contours arise on occasion in practice. Nonetheless, the Ti diagnostics are shown to remain exact in level and power for all such mixtures. Moreover, further matrix distributions, not necessarily having finite moments, are seen to generalize n–dimensional spherical symmetry to include non–Gaussian matrices of order (n×k) supporting Ti . For these the use of Ti remains exact in level. These findings serve to expand considerably the range of applicability of Ti in practice, to include matrix Cauchy and other heavy tailed distributions intrinsic to econometric and other studies. Case studies serve to illuminate the methodology.


Introduction
. ., Y ik ]; 1≤i≤n} having identical means E(Y ′ i ) = µ ′ = [µ 1 , . . ., µ k ] and dispersion matrix {V(Y i ) = Σ; 1 ≤ i ≤ n}.The model is {Y 0 = 1 n • µ ′ + E 0 }, where E 0 = [ ϵ i j ] consists of random errors, and E 0 = [ e i j ] contains the ordinary observed residuals.The problem at issue, and of persistent concern to users, is whether shifts of type {Y i → Y i + δ i } in R k may have occurred.Numerous approaches have been advocated, to include graphics and numerical diagnostics.Selected references are listed subsequently; a recent survey is Rodrigues and Boente (2011).Prominent deletion diagnostics are modeled on Mahalanobis (1936) distance metrics in R k .

Under single-case deletions, the rows [Y
′ and e i from E 0 = [E ′ , e i ] ′ .For k = 1, with S 2 i as the residual mean square, the R-Student statistics t 2 i = n e 2 i /(n−1)S 2 i trace to Snedecor and Cochran (1968, page 157) in testing for a single shift {Y i +δ i }; see also Beckman and Trussell (1974).Corresponding to t 2 i for k>1 are Hotelling's (1931) diagnostics where (n − 1)S i = Y ′ Y.In having exact and well documented normal-theory operating characteristics, these remain the diagnostics of choice for many users.Initially derived under normality and the mutual independence of [Y 1 , . . ., Y n ], these assumptions continue to validate the use of T 2 i up to the present.For early developments see Caroni (1987), and more recently Barrett and Ling (1992) and Barrett (2003), together with references cited.Clearly independence often fails in practice, yet the need to identify outliers nonetheless persists.
As background, precedents for this study in the case k = 1 include t 2 i and the diagnostics of Dixon (1950), Grubbs (1950), and Ferguson (1961) based on order statistics.As reassessed in Jensen and Ramirez (2015), all remain exact in level and power under dependent ensembles of distributions in R n , and for mixtures over these ensembles having star-shaped contours.
The objectives of this research are to extend those findings to include mixture distributions for Gaussian matrices Y 0 = [Y 1 , . . ., Y n ] ′ having dependent elements with the direct-product structure V(Y 0 ) = Ω ⊗Σ, and for mixtures of these.Here Ω is taken as an equicorrelation matrix Ω(ρ) or the more general Ω(ξ) to be identified.In addition, left-spherical matrix distributions serve to generalize the notion of spherical symmetry on R n , thereby to encompass matrix stable laws and Cauchy distributions.For the latter the use of T 2 i nonetheless is seen to remain exact in level.These findings serve to expand considerably the range of applicability of T 2 i in practice, to include heavy tailed distributions.Accordingly, normal-theory T 2 i diagnostics are seen to be genuinely nonparametric.Precedents for undertaking the mixtures and dependence structures of this study trace to Box and Tiao (1968) and Aitken and Wilson (1980), who modeled data from subsamples as Gaussian mixtures.Moreover, among other venues, calibrated data subject to errors of calibration often are equicorrelated under both direct and inverse calibration, as seen in Jensen andRamirez (2009, 2012).The importance of heavy-tailed distributions in economics and finance is highlighted in Ibragimov et al. (2015).An outline of the study follows.
Preliminary developments are given in Section 2. The principal findings follow in Section 3, and some consequences of these are detailed through examples in Section 4. Critical supporting topics, to include essential matrix distributions, are attached for completeness as an Appendix.

Notation
Spaces of note include the Euclidean n-space R n ; its positive orthant R n + ; the real (n×k) matrices F n×k ; the symmetric (n×n) matrices S n ; and their positive definite varieties S + n .Vectors and matrices are set in bold type; the transpose, inverse, trace, and determinant of A are A ′ , A −1 , tr(A), and |A|; Its density (pd f ) and cumulative distribution function (cd f ) are g(y) and G(y); and its characteristic function k is said to have the Wishart distribution W k (ν, Σ, Θ) of order k, with ν degrees of freedom, the scale parameters Σ, and the noncentrality matrix Θ.Further details are supplied in Appendix A.1.

The Model
We specialize from the model Definition 1.In particular, we take {Y 0 = 1 n •µ ′ +E 0 } and H n = 1 n 1 ′ n /n in keeping with the objectives of this study.
Assumptions A. The following hold for a model with a single shift.
, Ω(ξ)}; and As in conventional deletion diagnostics, this represents a shift {Y i → Y i +δ i } at the design point x i (now the index i) in X 0 .

Mixture Distributions
From Assumption A 1 let Λ = 1 n • µ ′ + ∆ and consider g n×k (y; Λ, Ω ⊗ Σ) in F n×k as the Gaussian density corresponding to N n×k (Λ, Ω ⊗ Σ) as in Appendix A.1.These generate ensembles as Ω ranges over Ξ, namely Next visualize the ensemble E 1 to have mixing parameters θ, and E 2 to have mixing parameters ξ.Then mixtures in F n×k of type In particular, the densities f 1 (y; Λ, G 1 ) and f 2 (y; Λ, G 2 ) are dispersion mixtures of elliptical Gaussian distributions on F n×k centered at Λ ∈ F n×k .Let G 1 and G 2 comprise all cd f s on Γ 1 and Γ 2 , respectively; these in turn generate the collections comprising all dispersion mixtures of the referenced types.

Overview
, and consider arbitrary shifts These findings in turn rest on matrices of quadratic and bilinear forms of type

Properties of Residuals
The observed residuals E 0 under Assumptions A are germane, as T 2 i is a function of these.In particular, it remains to evaluate E(e i ), Var(e i ) and L(e i ) as special cases.Details follow, where r = (n−1) and Theorem 1.Consider the ordinary residuals , Ω(ξ)}, and let T (E 0 ) be a mapping to a linear space V. Then the following properties hold independently of Ω ∈ {Ω(θ), Ω(ξ)}.
, Ω(ξ)}, since B n 1 n = 0 annihilates successive terms in expansions for Ω(θ) and Ω(ξ) in the product terms following the first.This together with Assumption A 3 gives (i).The expected product Thus E(e i ) = rδ i /n and V(e i ) = r n Σ is the (n, n) block of B n ⊗ Σ which, together with normality, give conclusion (ii).Conclusion (iii) follows directly.

Nonstandard Matrix Forms
Generalizing from Lemma A.1(iii) of Jensen (2001a) and from Jensen and Ramirez (2014) in extending to include matrix arrays, the multivariate Fisher-Cochran expansion generating where (e i , δ i ) are of order (k ×1), and the second line explains the first.Moreover, ( A 1 , A 2 , A 3 ) are idempotent as given explicitly in Jensen and Ramirez (2015), namely, To continue, designate the aforementioned matrix forms as Mathai and Provost (1992, page 201), Jensen and Ramirez (2015) Jensen and Ramirez (2015).Theorem A.1 of the attached Appendix now lifts those results to encompass the Wishart distributions of conclusions (i)-(iv).Conclusion (v) follows directly from expressions (1) and ( 10), and conclusion (vi) as the Wishart analog of the noncentral properties of Theorem A.1 of Jensen and Ramirez (2015).

Invariance under Mixtures
That the T 2 i diagnostics may be valid under star-contoured errors is the subject of the following, where i remain exact in level and power for all mixtures in M 1 ; (ii) Tests using T 2 i remain exact in level and power for all mixtures in M 2 ; (iii) These T 2 i distributions are identical to those derived from L(Y 0 ) = N n×k (Λ, I n ⊗Σ).

Left-Spherical Matrix Distributions
Other structured distributions are germane.This section draws heavily from Jensen and Good (1981).
Definition 2. A distribution on F n×k is left-spherical provided that L(X) = L(PX) for every orthogonal matrix P∈O(n).
Denote by L n×k (0, I n ) the class of left-spherical matrix distributions centered at the origin in F n×k , in which case its ch f has the form ϕ X = ψ(tr Nor are these required to have moments of various orders: examples are the left-spherical stable laws on with parameters {γ < 0, 0 < α < 2} of which the matrix Cauchy law with α = 1 is a noteworthy special case.In addition, the shift To continue, let M be a linear subspace of F n×k .The following is fundamental.
Definition 3. The transformation T : The following is given as Theorem 2 of Jensen and Good (1981).
Lemma 2. Suppose L(Z) ∈ {L n×k (M, I n ); M ∈ M}, and let the transformation T : F n×k → V be translation-invariant with respect to M and be right-invariant under Gl(k).Then the distribution of T (Z) is the same for all L(Z) ∈ L n×k (M, I n ) independently of M, and thus is identical to its matrix normal theory form.
We next examine with O(r×k) a matrix of zeroes as in Assumption A 1 .We proceed in two steps, first taking Y 0 → GY 0 , then the latter into T 2 i , namely, As B r Y contains deviations from means, it is clear that (n − 1) We have the following.
(i) Then the null distribution of T 2 i is the same for all L(Y 0 )∈{L n×k (M, I n ); M ∈ M} and thus is identical to its matrix normal theory form.
(ii) This holds in particular for every left-spherical stable law on F n×k having moments of order up to but excluding α.
Proof.Conclusion (i) follows directly from Lemma 2, where the translation invariance of T 2 i , and its right-invariance under Gl(k), are readily apparent.Observe that developments including the definition of M are devoid of δ i , i.e. δ i = 0, so that Lemma 2 establishes invariance of the null distribution L(T 2 i |H 0 ).Conclusion (ii) follows on recognizing that moments are not required in the developments of earlier sections.Specifically, Λ = 1 n • µ ′ + [O ′ , δ i ] ′ of Assumptions A may be taken to be location and shift parameters without first moments, and the earlier dispersion parameters I n ⊗Σ serve instead as scale parameters of the distributions in lieu of second moments.
Remark 1.It is nothing short of remarkable that operating characteristics of T 2 i should be identical under matrix Cauchy errors as under matrix Gaussian errors.The importance of heavy-tailed distributions in economics and finance is highlighted in Ibragimov et al. (2015) as noted.

Overview
Calibrated data often entail calibration curves, direct or indirect, both injecting dependencies among the calibrated measurements; see Jensen andRamirez (2009, 2012).These apply in the analysis of univariate data.Another venue adjusts observations directly to a common standard, as in compensating for the tare weight of a scale, or in assessing yield increments relative to a control yield as in Jensen (2001b).Subsequent examples fall within the latter framework, which we develop next for multivariate data amenable to Hotelling's T 2 i diagnostics.In short, observations often themselves random.To model this we proceed as follows: (a) Append For Case II, if rows of Y 0 are to be adjusted instead against W as standard, then In what follows we parallel steps given heretofore in working from Y 0 to T 2 i .
Lemma 3. Begin with Z 0 as constructed; rearrange as Z 0 = [Z ′ , Z i ] ′ with Z i as the test case; and determine T 2 i from Z 0 as before using Y 0 .Then i remains exact in level and power for all mixtures in M 1 of ( 5); (iv) These T 2 distributions are identical to those initially derived from the unadjusted L(Y 0 ) = N n×k (Λ, I n ⊗Σ).
is idempotent, to give conclusion (i).Conclusion (ii) follows directly, setting the stage so that conclusions (iii)-(iv) now follow from Theorem 3, to complete our proof.
Remark 2. Observe that a fractional adjustment {Z j = (Y j ± κ W); 1≤ j≤n} can be achieved on taking A = [I n , ±κ 1 n ].The stated conclusions follow directly if so modified.

Simulation Studies
As developments heretofore are tedious, convoluted, and unconventional, it is instructive to demonstrate Theorem 2 and then Theorem 3. Details follow.
Accordingly, N = 40, 000 random samples Y 0 ∈ F n×k of size n = 10 and k = 2 were generated from N n×k (0, I n ⊗Σ) with rows as independent bivariate Gaussian vectors having zero means and dispersion matrix Σ = [ 1.0 0.8 0.8 1.0
, where the shifts were added to the residual for row i = 10.Table 2 demonstrates that the powers for Hotelling's T 2 i diagnostics under equicorrelated data are equivalent to those tabulated for independent data.Recalling that , the noncentral F probabilities were computed using the Keisan Online Calculator provided by the Casio Computer Co., Ltd.
To demonstrate the validity of T 2 i in mixture distributions as in Theorem 3, N = 40, 000 random samples Y 0 ∈ F n×k of size n = 10 and k = 2 were generated from N n×k (0, I n ⊗Σ) having zero means and dispersion matrix Σ =

Running Times Example
Woodward (1970) studied the running times for n = 22 baseball players who ran three different paths rounding first base.These data, as used by Morrison (2005) to test for outliers, are reported in Appendix Table 7.The times that appear to be abnormal are those for Player 14 and Player 22.Using T 2 i we see in Table 4 that the running times for Player 22 are indeed outlying with p-value 0.0138.Times for Player 14 are not flagged as outliers.However, the last four rows of Table 4 give T 2 i for Player 14 assuming improvements in his running times in units of δ ∈ [ 0.1, 0.2, 0.3, 0.4 ].Beckett (1977) has identified that the n = 22 data points consist of two clusters, namely [2, 4, 5, 7-15, 17, 19-22] and [1,3,6,16,18], where each cluster consists of correlated data.Observe that the rank order of the times [Y 1 , Y 2 , Y 3 ] for Players 1, 6, and 18 differ from the rank order for other players.In consequence, Morrison's (2005) search for outliers using T 2 i is in dispute when based on the validating model of the day, namely V(Y) = I n ⊗Σ, instead of the apparent mixing over clusters.
A fortunate conclusion from Theorems 2 and 3 is that searches for outliers using T 2 i now has been validated for data sets from cohorts which are equicorrelated.And, in addition, even for data sets arising as mixture distributions, in a belated  with ρ = 0.8.The standardizing vector W was random from N k (0, Σ), with {Z j = Y j − W; 1 ≤ j ≤ n} and with Z 0 = AY † 0 as equicorrelated data with V(Z 0 ) = Ω(θ) ⊗ Σ, θ = 1.0 and corresponding ρ = θ/(1 + θ) = 0.5.Lemma 3(iii) notes that T 2 i = (r − 1)tr(Q 1 Q −1 2 ) remains exact in level and power for the equicorrelated data.Table 5 reports the tabulated and empirical critical values for this study, affirming that the critical values remain the same for data equicorrelated by adjustment to a common standard.Table 6.Tabulated and empirical powers for {T 2 i ≥ c 0.05 }, N = 40, 000 runs, with varying shifts δ ′ = [δ i , δ i , δ i , δ i ] for correlated {Z j = Y j −W} as Y j adjusted to the standard W, with V(Z 0 ) = Ω(θ)⊗Σ and θ = 1.0, ρ = 0.5, n = 20 and k = 4, and  . . . , δ 4 ] used a common value δ i and were added to the residuals for row i = 20.

Conclusions
The objectives set forth in the Introduction now have been met, namely, the construction of dispersion mixtures of matrix distributions under direct products of the type V(Y 0 ) = Ω ⊗Σ, together with the invariance of null and nonnull distributions of T 2 i under such mixtures.Additional findings establish the null distribution of T 2 i to be invariant under left-spherical symmetry, encompassing matrix stable and Cauchy distributions as special cases.These findings do expand considerably the theoretical bases validating the use of T 2 i in practice.To place this study in perspective, users long have recognized that classical Gaussian models often are inadequate under the exigencies of contemporary research.Early remedial efforts focused on spherical and elliptical symmetries in R n , generating many well known research papers and monographs.More recent studies examine additional structural properties in R n ; examples include Arnold et al. (2008), Kamiya et al. (2008), Sarabia and Gómez-Déniz (2008), Richter (2013), andRichter (2014).These include mixtures, asymmetries, and various star-contoured densities in R n .Among the latter are the dispersion mixtures of Jensen and Ramirez (2015), having the remarkable feature that the distributions of t 2 i and those of Dixon (1950), Grubbs (1950), andFerguson (1961) are all invariant and thus identical to their normal theory forms.
The present study breaks new ground in extending the n-dimensional mixture distributions of Jensen and Ramirez (2015) to include star-contoured matrix distributions in F n×k , together with invariance of the distribution of T 2 i .The case studies offer further insight regarding the extended uses of T 2 i in practice.

A.1 Matrix Distributions
We collect basics for matrix distributions essential to the present study.First partition Y 0 by columns as order (nk × nk), then for fixed ( A, B) and for U = AYB ′ , the corresponding moment arrays are as follow.
Proof.Conclusion (i) follows on substituting t uu ′ for T in the chf for W, together with the fact that the nonvanishing eigenvalue of uu ′ is u ′ u.Conclusion (ii) follows on lifting from R 1 + to S + k ; this may be done using the characterization of Cramér and Wold (1936), as carried out in Jensen (1982).Conclusion (iii) follows from (i) and (ii) on verifying that the joint chf 's of (W 1 , W 2 ) and of (u ′ W 1 u, u ′ W 2 u) factor into the product of their marginal chf 's.
Remark 4. The central version of conclusions (i) and (ii) was given in Result (ii) of Rao (1973, page 535).

A.2 Running Times Data
The data employed in Section 4.3 are listed here as reported in Morrison (2005, page 102).

Table 1
reports the empirical critical values for T 2 i corresponding to tabulated critical values c α from Theorem 2(vi) such that L(T 2 i ) = T 2 k (u; r −1, 0).The row being evaluated for a potential shift is set to i = 10.Computations yielding T 2 i were undertaken for each repetition, with results as summarized in Table1.

Table 7 .
Running times around first base for k = 3 paths and n = 22 players.