A Robust Alternative to the t-Test

t-test is a classical test statistics for testing the equality of two groups. However, this test is very sensitive to non-normality as well as variance heterogeneity. To overcome these problems, robust method such as Ft and S1 tests statistics can be used. This study proposed the use of a robust estimator that is trimmed mean as the central tendency measure in Ft test and median as the central tendency measure in S1 test when comparing the equality of two groups. The performance of the S1 test with MADn was able to give the most convincing result than the other methods. The Ft with MADn showed comparable results with the conventional methods. This study has shown some improvement in the statistical solution of detecting differences between location parameters. These modified methods may serve as alternatives to some other robust statistical methods which are unable to handle either the problem of non-normality, variance heterogeneity or unbalanced design.


Introduction
In recent years, numerous methods for locating treatment effects or testing the equality of central tendency (location) parameters by simultaneously controlling the Type I error and the power to detect treatment effects are being studied.Progress has been made in terms of finding better methods for controlling the Type I error and the power of the test that detects treatment effects in one-way independent group designs (Babu, Padmanabhan & Puri, 1999;Othman et al., 2004;Wilcox & Keselman, 2003).Through a combination of impressive theoretical developments, more flexible statistical methods, and faster computers, serious practical problems that seemed insurmountable only a few years ago can now be addressed.These developments are important to applied researchers because they greatly enhance the ability to discover true differences between groups while maximizing the chance of detecting a genuine positive effect.
One way to overcome the problems of controlling Type I error rates is by using robust statistics.There are several definitions of robust statistics that have been found in the literature and these unfortunately lead to the inconsistency of its meaning.Most of the definitions are based on the objective of the particular study by different researchers (Huber, 1981).
A statistical method is considered robust if the inferences are not seriously invalidated by the violation of such assumptions, for instance non-normality and variance heterogeneity (Scheffe, 1959).Huber (1981) defined robustness as a situation which is not sensitive to small changes in assumptions while Brownlee (1965) reported slight effects on a procedure when appreciable departures from the assumptions were observed.
The theory of robust statistics deals with deviations from the assumptions on the model and is concerned with the construction of statistical procedures which is still reliable and reasonably efficient in a neighborhood of the model (Ronchetti, 2006).Hampel, Ronchetti, Rousseeuw and Stahel (1986), stated that in a broad informal sense, robust statistics is a body of knowledge, partly formalized into "theories of robustness" relating to deviations from idealized assumptions in statistics.As mentioned by Hoel, Port and Stone (1971), a test that is reliable under rather strong modifications of the assumptions on which it was based is said to be robust.Hence in this thesis, a statistical method is considered robust when it has estimators which cannot be influenced by the deviations from the given assumptions when hypothesis testing is being conducted.
Robust statistics has widely been used for many years now.Ronchetti (2006) reported that research in robust statistics has been conducted since 40 years ago and this area of research is still being actively studied today.In Ronchetti's (2006) quick search in the Current Index of Statistics, 1617 papers on robust statistics were found between 1987 and 2001 in statistics journals and related fields.
The goal of this study is to search for alternative methods in testing for the equality of central tendency measures by simultaneously controlling Type I error and improving power rates in the one-way independent group design under skewed distributions.The proposed procedures to be adopted in this study are among the latest procedures in robust statistics.The procedures are modified Ft and modified S1 which were proposed by Md Yusof et al. (2007) and Syed Yahaya (2005) respectively.These two procedures are for testing the equality of the central tendency measures.The Ft uses trimmed mean while S1 uses median as the central tendency measures.The performance of these methods in terms of type I error rates for the case of two groups are determined and compared.The performance of the methods was further demonstrated on real education data.

Method
This paper focuses on the modified F t and S 1 methods, which combines F t and S 1 statistics with one of the scale estimators suggested by Rousseuw and Croux (1993).
These methods were compared in terms of Type I error under conditions of normality and non-normality which will be represented by skewed g-and h-distributions.Lee and Fung (1985) introduced a statistical procedure that is able to handle problems with sample locations when non-normality occurs but the homogeneity of variances assumption still applies.This statistic was named trimmed F statistic, F t .Their work focused on the best trimming percentages used to produce trimmed means which are able to control Type I error and to provide good power rates of the statistical procedure.

F t Statistic
They recommended the trimmed F statistic with 15% symmetric trimming as an alternative to the usual F test especially when the distribution is long tailed symmetric.This method is simple and easy to program.
To further understand the F t method, let be an ordered sample of group j with size n j .
Hence the g-trimmed F is defined as where J = number of groups 1 2 j j j j h n g g    j g represents the proportion of observations in j th group that are to be trimmed in each tail of the distribution. 1 = the j th group trimmed mean, and tj SSD = the g-Winsorized sum of squared deviations.F t (g) will follow approximately an F distribution with (J -1, H -J) degrees of freedom.Modification on F t was done on the calculation of trimmed mean.

Trimmed Mean Let
be an ordered sample of group j with size n j .MOM trimmed mean of group j is calculated by using:

(multiplier of scale estimator), n j = group sample sizes
For the equal amounts of trimming in each tail of the distribution, the Winsorized sum of squared deviations is defined as When allowing different amounts of trimming in each tail of the distribution, the Winsorized sum of squared deviations is then defined as,

S 1 Statistic
To understand S 1 , consider the problem of comparing location parameters for skewed distributions.Let ) ..., , , ( be a sample from an unknown distribution F j and let M j be the population median 1 for at least one pair of (i, j), the S 1 statistic is defined as is the sample median from the jth group, of group j j  is the squared mean absolute deviation from sample median j M ˆ j n is the sample size for group j.
Modification on S 1 was done by substituting the default scale estimator, j  ˆ with the well known robust scale estimator, MAD n .

MAD n
MAD n is the median absolute deviation about the median.It demonstrates the best possible breakdown value of 50%, twice as much as the interquartile range and its influence function is bounded with the sharpest possible bound among all scale estimators (Rousseeuw & Croux, 1993).This robust scale estimator is given by where the constant b = 1.4826 is needed to make the estimator consistent for the parameter of interest, and j i  However, there are drawbacks in this scale estimator.The efficiency of MAD n is very low with only 37% at Gaussian distribution.Rousseeuw and Croux (1993) carried out a simulation on 10,000 batches of Gaussian observations to verify the efficiency gain at finite samples.They compared the variance of the standard deviation with the variance of MAD n based on the finite samples.MAD n also takes a symmetric view on dispersion and does not seem to be a natural approach for problems with asymmetric distributions.

Bootstrap Method
The bootstrap is a Monte Carlo method that can be used to estimate the standard error of any estimator   and was introduced by Efron (1979).The advantage of bootstrapping is its simplicity.This method is straightforward to apply to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients.Staudte and Sheather (1990) in their study stated that bootstrap is used to indicate that the observed data are used not only to obtain an estimate of the parameter but also to generate new samples.Bootstrap can routinely answer questions far too complicated for traditional statistical analysis.They work the same way (without formulae) for many different statistics in many different settings.In addition, bootstrapping can help in increasing accuracy of the test statistic.
When the sampling distribution of the estimator of interest is unknown, a pseudo sampling distribution of the estimator can be estimated using bootstrap.With the establishment of the pseudo sampling distribution, we can now access variability of an estimator, bias of an estimator and significance of a test involving the estimator (Efron, 1979).
Bootstrap method is known to yield a better approximation than the one based on the normal approximation theory (Babu & Padmanabhan, 1996;Babu et al., 1999).Othman, Keselman, Padmanabhan, Wilcox and Fradette (2003) listed out two practical advantages of using bootstrap methods as detailed below: i) Theory and empirical findings indicate that they can result in better Type I error control than non-bootstrap methods.
ii) There are some bootstrap methods which do not require the knowledge of the sampling distribution of the test statistic.This makes hypothesis testing quite flexible.Westfall and Young (1993) suggested that Type I error control could be improved by combining bootstrap methods with methods based on trimmed means.The bootstrap seems preferable for general use if the goal is to avoid Type I error probability greater than the nominal level (Wilcox, 1998).The strategy behind the bootstrap is to use the shifted empirical distributions to estimate an appropriate critical value (Othman et al., 2003).Keselman, Wilcox and Lix (2003) stated that, further improvement in Type I error control is often possible by obtaining critical values for test statistic through bootstrap methods.
In this study, the methods of bootstrap and non-bootstrap are compared.The bootstrap procedures on S 1 statistic and approximate procedures on F t statistic

Empirical Investigation
This paper only focused on unequal sample sizes and heterogeneous variances for two groups with small samples.A group of size N = 40 was chosen.The sample were set at n 1 = 15 and n 2 = 25.Each method will be tested under two types of distributions with g = 0.0 and h = 0.0 (normal) and g = 0.5 and h = 0.5 (skewed leptokurtic).The g-and h-distributions were first proposed by Hoaglin (1985).These distributions are transformations of the standard normal distribution.By manipulating the g-parameter one can transform the standard normal distribution into a skewed distribution.In addition to this, one can also transform the standard normal distribution into a heavy tailed distribution by changing the h-parameter.For this study, 5000 datasets were simulated for each of the procedure.The random samples were drawn using SAS generator RANNOR (SAS Institute Inc, 1999).
To test the Type I error, the group means were (0, 0, 0, and 0).For each design, 5000 datasets were simulated.For S 1 statistic 599 bootstrap samples were generated.

Simulation Results
The robustness of a method is determined by its ability in controlling the Type I error.By adopting Bradley's liberal criterion of robustness (Bradley, 1978), a test can be considered robust if its empirical rate of Type I error, is within the interval  5 .0 and  5 . 1 . If the nominal level is  = 0.05, the empirical Type I error rate should be in between 0.025 and 0.075.Correspondingly, a test is considered to be non-robust if, for any particular condition, its Type I error rate is not within this interval.We chose this criterion since it was widely used by most robust statistic researchers (e.g.Keselman et al., 2000;Othman et al., 2004;Syed Yahaya et al., 2004;Wilcox et al., 2000) to judge robustness.Nevertheless, for Guo and Luh (2000), if the empirical Type I error rate do not exceed the 0.075 level, it is considered robust.The best procedures are those procedures that can produce Type I error rates closest to the nominal (significance) level.
The Type I error rates for two groups case is presented in Table 2.The second column of the table displays the pairing categories.Positive pairing refers to the case in which the largest sample size is associated with population having the largest variance and the smallest sample size is associated with the population having the smallest variance.While negative pairing refers to the case in which the smallest sample size is associated with the population having the largest variance, and the largest sample size is associated with the population having the smallest variance.

Analysis on Real Data
The performance of the modified S 1 and F t methods were demonstrated on real data.The following sections discuss the performance of both methods.We also compare the methods with the parametric and non parametric methods.
Two classes (groups) of Statistical Distribution Theory (2 nd Semester 2010/2011) were randomly chosen.The final marks were recorded and tested for the equality between the classes.The sample sizes for Class 1 and 2 were 48 and 46 respectively.The descriptive statistics for each of the groups and the results of the test in the form of p-values are given in Table 3 and Table 4 respectively.4 shows that only S 1 with MAD n method produce significant result (reject the null hypothesis).This indicate that this method able to detect the difference which exists between the groups.When testing using ANOVA, Mann-Whitney and F t with MAD n methods the result fails to reject the null hypothesis such that the performance for all groups is equal.Based on the p-values, the Mann Whitney procedure can least detect the difference (large p-value).The results (real data) are consistent with the simulation data result with S 1 produced consistent and robust result as compared to the other three methods.

Discussion and Conclusions
The goal of this paper is to find the alternative procedures in testing location parameter for skewed distribution by simultaneously controlling the Type I error and power rates.Classical method such as t -test and ANOVA is not robust to non-normality and heteroscedasticity.When these problems occur at the same time, the Type I error will increase causing wary rejection of the null hypothesis and power of test can be substantially reduced from theoretical values, which will result in differences going undetected.Realizing the need of a good statistic in addressing these problems, we integrate the S 1 statistic by Babu et al. (1999) and F t statistic introduced by Lee and Fung (1985) with the high breakdown scale estimators of Rousseuw and Croux (1993) and these new methods are known as the modified S 1 and F t methods.This study has shown some improvement in the statistical solution of detecting differences between location parameters.
In controlling the Type I error rate, the study reported in this study leads us to formulate the following conclusions and recommendations.For both distributions (normal and extremely skewed), the robust methods (S 1 with MAD n and F t with MAD n ) showed comparable results with the conventional methods (t -test and Mann-Whitney).However, the demonstration on real data showed that S 1 with MAD n method was able to give the most convincing result than the other methods.
It is our impression that applied researchers would prefer a method that compared treatment performance across groups with a measure for the typical score which was based on as much as the original data as possible.Modified S 1 will be the best choice for this purpose because when working with two groups' case, the researchers can work with the original data without having to worry about shape of the distribution.

Table 1 .
Design specification

Table 2 .
Type I error rates Discription: As shown in Table2, the Type I error rates produced by S 1 with MAD n are robust across the two types of distributions, while Mann-Whitney produced robust value for positive pairing only.However, for the F t with MAD n and t-test the Type I error rates are robust only under one condition i.e. positive pairing under normal distribution.

Table 3 .
Descriptive statistics for each group

Table 4 .
Results of the test using different methods