On the Error Rate Comparison of the Quadratic Discriminant Function , Euclidean Distance Classifier , Fisher ’ s Linear Discriminant Function and the Vine Copulas

The estimation of the error rates is of vital importance in classification problems as this is used as a basis to choose the best discriminant function; that is, the one with a minimum misclassification error. The quadratic discriminant function (QDF), Euclidean Distance Classifier (EDC), and Fisher’s Linear Discriminant Function (FLDC) have been in use for a long time for the purpose of classification. In this paper, we compare the misclassification error rate of the QDF, EDC, and FLDC with the Vine Copulas based on Gaussian and Clayton models. The results were obtained for the general case where the means are unequal and the covariance matrices are unequal.


Introduction
In discriminant analysis, the selection of the discriminant function is solely decided by the associated error rates and hence the estimation of error rates is of importance in classification problems.During the past fifty years, several discriminant functions have been proposed for the purpose of classification since the advent of the "Separation Theorem" and the "Supporting hyper-plane Theorem" in Mathematics.Foremost among these discriminants is the quadratic discriminant function (QDF).In addition to this QDF, one can use the other discriminants such as the linear discriminant (LDF), Fisher's Linear Discriminant Function (FLDC), Euclidean Distance Classifier (EDC) and the Absolute Euclidean Distance Classifier (AEDC) (see Ganesalingam et al 2006), and many others.QDF is seen to outperform the other existing methods when the covariance matrices are non-singular.However, QDF is not applicable for discrimination when the covariance matrices are singular.Note that AEDC is not applicable when the population means are unequal.Similarly, LDC is not applicable when the population variance-covariance matrices are unequal.
On the other hand, the pairwise Vine Copulas enable higher dimensional multivariate distributions to be expressed in a simple two dimensional format.The Vine Copulas are used increasingly in financial and engineering modelling as these are useful in reducing the dimension.In this paper, we compare the performance of the Vine Copulas with the QDF, EDC and FLDC.The Vine based Copulas are seen to perform reasonably well when compared to QDF, EDC and FLDC.The Vine based Copulas are very useful for discrimination when the covariance matrix in the full dimension is singular while the covariance matrix in the two dimensional (pairwise) analysis is non-singular.In this paper, we consider a three dimensional (full dimension) problem for discrimination.

Classical Discriminant Methods
Consider the problem of statistical discrimination involving two multivariate normal populations 1  and 2  with mean vectors 1  and 2  and covariance matrices 1  and 2  respectively.Here, we assume that 1 The discriminant function which would normally be used in such a situation is the quadratic discriminant function (QDF), which allocates an object with observation vector x to 1 Otherwise it is allocated to 2  (see for example Morrison (1990)).In the above allocation rule and throughout this paper, we assume that the prior probabilities of both populations are equal.
Otherwise, it is allocated to 2  .Note that (1.2) is the linear discriminant (LDF) rule.
It is clear that when the data is normal and 1 2    in expression (1.2), the linear discriminant function cannot be applied.In this case, one could resort to using either the Fisher Linear Discriminant Function (FLDF) or the quadratic discriminant function (QDF) if the covariance matrices are unequal; see Section 3.2.4 of McLachlan (1992).In the case of Fisher Linear Discriminant Function (FLDF), the observation vector x is allocated to Population 1 In this paper, we compare the performance of QDF, FLDF and EDC empirically with the Vine based Gaussian and Clayton Copula models, and focus on the error rate.Here, we use computer simulations to compare the performance.
In the next section, we will use Vine Copulas to compute the misclassification error rate.Joe (1996Joe ( , 2014)), Bedford andCooke (2001, 2002), Kurowicka and Cooke (2006) pioneered in introducing the concept of Vine Copulas in modelling high dimensional joint distributions by using sufficient number of pairwise copulas.This aspect of Copula modelling is known as Vine Copulas.There are two types of Vine Copulas; C-Vine copula and D-Vine copula.In D-Vine copula, no node in any tree is connected to more than two edges whereas in C-Vine copula every node in tree

Vine Copulas
where   Here we investigate the construction of the vine based on the Gaussian Copula.The Gaussian Copula densities are given as follows.
For an arbitrary trivariate normal population, the conditional density   . .

 
(2.12) For the Vine Copula based approach, the likelihood ratio which depends on the Gaussian model is The log-likelihood ratio is given by ln( where the superscripts ( 1) and ( 2) represent populations ( 1) and ( 2) respectively.

Vine based on the Clayton Copula:
Next, we investigate the Vine based on the Clayton Copula.The Clayton Copula based densities are given as follows. (2.17) So, the likelihood ratio based on the Clayton Copula is where the superscripts (1) and (2) represent populations (1) and ( 2) respectively.

Numerical Results
In this section, we report the results for some covariance matrices based on 1000 simulation runs each containing simulated samples of size = 1000.Note that the mean vectors and the covariance matrices for the two populations are unequal.We generate these random samples from a multivariate normal population by using the mean vector listed below and the covariance matrices that are listed in the table.In other words, the mean vector and the covariance matrix are assumed to be known in our simulation.We decompose the covariance structure by using the well-known Cholesky decomposition.Next, we generate the vector components according to a normal distribution by using the statistical software SAS.The error rate P12 (classifying Population 1  as Population 2  ) was calculated empirically by averaging the error rate when we apply the rule given by (1.1).Similarly, the error rate P12 (classifying Population 1  as Population 2  ) was calculated empirically by averaging the error rate when we apply the rule given by (2.14) for the vine Copula based on the Clayton model and the Gaussian model.
In a similar fashion, we calculate the error rate P21 (classifying Population 2  as Population 1  ) empirically by interchanging the corresponding terms.We present the numerical results for p = 3.The mean vectors and the covariance matrices are assumed to be known.The mean vectors are as follows for all the covariance matrices studied for the case p=3.

Discussion and Conclusion
As can be seen from Table1, the Vine based Gaussian Copula seem to outperform the existing discriminant methods such as the Quadratic Discriminant Function (QDF), Euclidean Distance Classifier (EDC), Fisher's Linear Discriminant Function (FLDC) on the error rate P12 (classifying Population 1  as Population 2  ) and also on the error rate P21 (classifying Population 2  as Population 1  ).We did not consider the Absolute Euclidean Distance Classifier (AEDC) for the reason that it requires the population means to be equal.Similarly, we did not consider the Linear Discriminant Function (LDF) as it requires the covariance matrices to be equal.The reason is that in this study, the populations have unequal means and unequal covariance matrices.So, in place of LDF, we considered Fisher's Linear Discriminant Function (FLDF) as it does not require the covariance matrices to be equal.Although, the use of QDF is acceptable as long as the covariance matrices are non-singular, in real life problems with high dimensions, the variables are often correlated and hence the covariance matrix exhibits singularity.This may render the QDF method unsuitable for discriminant analysis.In such cases, the alternative is to use the two dimensional pairwise Vine Copulas for the discriminant analysis.In fact, the two dimensional Vine based Gaussian Copula is a better choice as according to our study, the Vine based Gaussian Copula seemed to perform better than QDF, EDC, and FLDC as a discriminant.
edges.The D-Vine and the C-Vine copulas coincide when the dimension 3  p .More recently,Acar et al (2012) andStoeber and Czado (2013) looked at the limitations, extensions and applications of the Vine Copulas.This motivated us to consider the applicability of the Vine Copulas in the context of discriminant analysis.Here, we compare the performance of the Vine based Clayton and Gaussian Copulas against the existing discriminant methods such as Quadratic Discriminant Function (QDF), Fisher's Linear Discriminant Function (FLDC), and Euclidean Distance Classifier (EDC).function.Then, by using the properties of the Vine copulas, one can write X and Y 23 c is the pairwise copula density of Y and Z 2 \ 13 c is the pairwise copula density of X and Z given Y Vine based on the Gaussian Copula: