Discriminant Analysis as an Aid to Human Resource Selection and Human Resource Turnover Minimization Decisions

A discriminant analysis is conducted in order to estimate a discriminant function to determine the expected status of the faculty post candidates in a private university in Bangladesh. The explanatory variables are age of the candidate, salary offered for the post, whether the candidate has foreign degree-dummy variable and result in masters’ examination of the candidate in Bangladesh. Statistically significant differences are observed in the group means of the variables of the two groups: not stayed faculty & stayed faculty. The log determinants are found approximately equal in size for the groups while the Box’s M value shows that the assumption of the equal co-variances is violated. However, the uni-variate normality tests are conducted and found the variables follow approximately normal distribution. Consequently, we proceeded to estimate the discriminant function. The estimated function is significant at 1 per cent level of significance and can explain 50 per cent of the variations in the group memberships. The structure matrix shows that the variables: result (0.526), f-degree (-0.489) and salary (0.408) are very important and the age (0.127) is the least important determinants of the expected status of the faculties. Finally, the prediction matrix of the holdout sample shows that 83 per cent of the cases are classified correctly.


Introduction
Like developed countries, in Bangladesh, most of the organizations have introduced a human resource management department in order to efficient management of human resources and to increase the productivity of the organization in recent years.The suitable human resources selection is one of the main functions of the human resources management department.At present, in the most of the organization, human resources are selected in a traditional way-based on a written test and/or an oral interview.But this selection process is creating a large human resources turnover ratio in the organizations.In fact, a discussion with the head of the human resource department of the organization studied in this study reveals that the human resource turnover ratio is about 30-40 per cent per year for the organization and the post studied in this research.As a result, this high human resources turnover ratio is a big concern to the organization as it creates substantially higher human resources management cost.
High human resources turnover ratio generates substantial high cost to the concerned organization.As for example, a dairy firm owner asserts, "Every time a milker leaves, I loss about a cow" (Billikope 2003).Furthermore, if a salesperson leaves the organization, the lost sale with the unmanned area is ranging from $50, 000-$75,000 (Futrell and Parasuraman, 1984).They also reported that some organizations face 50 percent turnover in their sales force recruited in the less than last two years.In addition to financial loss; there may be an overall reduction in organizational welfare, leakage of confidential information to competitors and others.Hence taking initiative for reducing human resource turnover ratio by any organization is justified for economic and non-economic reasons.Like many other developing countries, use of discriminant analysis in human resources selection/faculty selection is not found in Bangladesh.
In this study, an attempt is made to model human resources selection process in order to minimize human the resources turnover ratio and to decrease the human resources management costs.An old technique called linear discriminant analysis (originally developed by Fisher, 1936) is used in this study in order to achieve the objective.The discriminant analysis is look like regression analysis in terms of dependent variable, number of independent and nature of independent variables.For instance, in both of the methods, independent variables can be metric or non-metric.In addition, one dependent variable and single/multiple independent variables can be used in both methods.Furthermore, discriminant analysis uses OLS in order to estimate parameters by minimizing the within group sum of squares.But discriminant analysis differs significantly from regression analysis in terms of nature of dependent variables.In linear discriminate analysis, dependent variable is a categorical variable/non-metric whereas in regression analysis; the dependent variable is a metric variable.However, binary logistic regression analysis is similar to two-group discriminant analysis.
The linear discriminant function takes form of a linear combination of coefficients of variables and their respective variables in the study as equation 1.The variable coefficients are estimated such that the function maximizes the distance between the two centroids.That happens when a ratio (λ)-between group sum of squares to within group sum of squares is maximized.For any other combination, the function will not be optimal because the validity of the model will not be justified.The coefficients are unstandardized coefficients in the equation 1.However, the standardized coefficients also can be estimated.But in a standardized function, there is no constant term as the mean of a standardized variable is Zero.The larger the coefficient the better the independent variable in discriminating between the groups.A good independent variable should have large weight. ∑ where, Z = discriminant score, α = a constant term, β i = the discriminant coefficient or weight of the variable, X i = predictor or independent variable, i= number of predictor variables; i= 1,2,3,….k.
By substituting the unstandardized values of a new case in the estimated unstandardized coefficients equation and by substituting the standardized values of a new case in the estimated standardized coefficients equation, the expected position of the case is determined.In this study, the main focus is on unstandardized discriminant function.The number of discriminant functions (NDF) those could be estimated from a discriminant analysis is less than or equal to the minimum number of number of categories (G) in the dependent variable minus one or the number of predictor variables (P): So, in a two-group discriminant analysis, only one function is estimated.As stated earlier, the between group sum of squares (SSB) to within group sum of squares (SSW) which is maximized at the time of the estimation of the discriminant function is as follows: Where G = Number of groups, g = Group g, g = 1, 2,……., G, = Number of firms in group g, = Firm p in group g, p=1…….Ng, = Group mean/centroid, = Over all sample mean/grant mean.

Figure-I
shows the pictorial presentation of the data collected on two variables: X1 and X2 for a two groups: G1 and G2 discriminant analysis.If there are two variables, then the data can be presented in the graph like figure-1 and could have a look at the two clusters and their characteristics profiles.Upper part of figure-1 shows that some cases of G1 group are also in G2 group.The cases which are actually member in G1 but included in G2 also are misclassified.Similarly, some cases of the G2 are also in the G1 group.The cases those are actually members of G2 but included in G1 also are misclassified by the figure 1.In other words, 13 cases are common in the two groups in figure-1.Symbolically, n (G1∩G2) = 13.Actually, one case should be in one group as group member because group membership is mutually exclusive and collective exhaustive but the figure showed in both which is misclassified presentation by the upper part of the figure.
Like upper part of the figure, separating the two groups graphically is not possible when the data are collected on more than two variables as we have two axes to present two variables in the graph.But the problem is easily solved by the discriminant analysis like the lower part of the figure 1.For the purpose, discriminant analysis generates Z-scores: negative or positive value for the cases and a new axis-Z is created when data is collected on more than two variables.In Figure 1, Z scores are computed for the two groups: G1 and G2.The Z values of the G1 group is expected to be negative as centroid for this group is negative and denoted by a new set G1 /.But some G1 / cases have positive value which means that the discriminant function misclassified those cases.
Similarly, the Z values for G2 group is expected to be positive as the centroid for this group is positive and denoted by G2 / .But some G2 / cases have negative value which means that those cases are misclassified by the discriminant analysis.The miss-classified cases are represented by the shaded area.The smaller the shaded area, the greater accuracy of the discriminant function is ensured.(Uddin, 2013;Malhotra & Das, 2011).The average of the Z values of the all groups is denoted by G0.

Figure 1. Discriminant analysis
The broad objective of this study is to estimate a two-group discriminant function in order to identify efficient human resources-faculty who will stay in the University for the long run.In consistent with the broad objective, the specific objectives are as follows: (i) to prepare the characteristics profiles of the groups in the dependent variable, (ii) to check whether significant difference exists between the group means of the variables, (iii) to estimate a discriminant function to estimate the expected status of the candidate for the faculty post in the university, (iv) to point out which variables contributing most in determining the group membership.(v) to check the validity/acceptability of the estimated function, (vi) to recommend policy implications to the policy maker.
The rest of the study is organized as follows.Section 2 is about the literature review of the study.Section 3 describe the methodology of the study.Section four deals with the conducting discriminant analysis and section 5 is about conclusion of the study.The study report ends with a references list.

Literature Review & Variables Selection for the Study
Literature in discriminant analysis for decision making is very large.For instance, discriminant analysis is widely used for insolvency prediction, and sales force turnover management.However, any research paper presenting discriminant function for faculty selection for a university is not found by us.As a result, it is not possible to review any paper in this research field.But, in order to design this research, several papers in other fields are studied.The papers in other field helped the author substantially to design and conduct this research.Specifically, the several papers are examined to notice the kind of variables used in their study.Lucas et.al (1987) argued that by reducing human resource turnover ratio, the costs of human resource selection, training & development, and maintenance can be reduced substantially.To achieve the purpose, they conducted a discriminant analysis with a view to estimating a discriminant function to determine the expected status of the sales force of a selling organization.To estimate the function, they have used three employee characteristics variables and four attitude variables.Three employee characteristic variables are age, education, tenure and four attitude variables are intrinsic job satisfaction (IJS-satisfaction with work), extrinsic job satisfaction (EJS-pay, fringe benefits, job security etc), supervisory consideration (SC-employee employer relationship), and task-specific-self-esteem (TSSE-self perceived quality and quantity of performance).They have used three techniques to analyze their data: multiple regression analysis, MANOVA, and discriminant analysis.The discriminant analysis part of their study shows a more than 70 per cent hit ratio that ensures the validity of using discriminant analysis in the field (Boyd, Westfall & Stasch 2005).Welker (1974) proposed to use discriminant function for a CPA firm to select candidate from a large number of applicants who might stay in the firm for the long-run.He proposed to use the general information variables: age, asset balance, debt balance; performance information measures: college point average, class ranking in the college of business, number of hours of accounting class in a business college, number of hours of quantitative class in a business college, achievement test scores (professional courses), average expected hour to stay away from home per week, expected hours of overtime per week, and size of the firm (no. of employees or clients); psychological test variables: I.Q test results, dominance-submission rank, conformity-nonconformity rank, passivity-tension rank, extroversion-introversion rank; and interviewer's subjective rank of the applicant variables (0-10): appearance, communication, and overall in the discriminant analysis function for the large CPA firm.Jensen & Bailey (1975) supported Welker's (1974) proposal for employee selection but suggested to use alternative method if the assumptions are not satisfied by the data.
In this research, only four variables are included in the discriminant analysis because of lack of data.The variables are age of the applicant (age), salary per month for the post (salary), whether the applicant has foreign degree-if has foreign degree then1, otherwise 0 (F-Degree), and result of masters of the candidate in Bangladesh in CGPA (result).The discriminant analysis shows that the variables could predict the group memberships by more than 75 per cent correctly.

Data
The study is mainly based on primary data.It is collected from job application forms of the faculty by filling up a predetermined questionnaire.Secondary data is collected from books, journals, magazine, websites and SPSS manual of George & Mallery (2006).The collected data is divided into two samples: analysis sample and holdout sample.There is no specific rule about the proportion of each sample in total sample.The division may 50-50, 60-40, or 75-25.In this study, the collected primary data is divided into two samples as (1) analysis sample consists of 70 per cent and (2) holdout/split/validation sample consists of 30 per cent and each sample contains equal proportion of the groups: not stayed (1), and stayed (2) as proportionately stratified random sampling rule.
Analysis sample consists of 15 not stayed (1) faculty & 15 stayed (2) faculty and the holdout sample consists of 6 not stayed (1) faculty and 6 stayed (2) faculty.The analysis sample is used to estimate the discriminant function and the holdout sample is used to check the accuracy of the prediction power of the model.If possible, it is better to collect data for a larger number of cases then the total data set should be divided into two groups: analysis sample and holdout sample equally.The analysis sample should be used to estimate the discriminant function and the holdout sample should be used to forecast the accuracy of the estimated model.Then the role of the data set should be reversed.In other words, the holdout sample should be used to estimate the model and the analysis sample should be used to check the accuracy of the estimated model.This process is well known as cross-validation approach.

Description of the Variables
The variables used in this study are divided into two types: dependent variable and independent variables.The only dependent variable is the status of the faculty which is a categorical variable.The employee who has left the university is denoted by not stayed (1) and who stayed in the university is denoted by stayed (2).The independent variables are, result in GPA in masters examination of the candidate for the faculty post in Bangladesh (result), whether the faculty has foreign degree (dummy variable).Based on the historical data collected from the resumes, if the faculty has a foreign degree, that is denoted by 1 and if the faculty does not have any foreign degree that is denoted by 0 (f-degree), salary is the per month financial benefit of the faculty for the faculty post.Finally, age means the age of the faculty at the time of application.

Data Analysis Technique and Software Used
In order to answer the research questions of the study, direct method of discriminant analysis is used as an analysis technique in this study.According to the direct method in discriminant analysis, all of the independent variables are included in the model simultaneously.The direct method discriminant analysis is used based on previous researches or by other sources, if the researcher knows that discriminant analysis should be based on all of the independent variables.Besides, Graham (2001) argued not to use step-wise discriminant analysis for several limitations.According to the stepwise method discriminant analysis, the independent variables are included in the model according to the discriminating power of the variables.

Equal Variance-Covariance Matrix
The main assumption to conduct the discriminant analysis is that the groups have equal variance-covariance matrices although their means are substantially different.This assumption is tested by using a transformed value of Box's M, which compares the equality of log determinants of the categories in the dependent variable, called F ratio.Theoretically, this F is equivalent to the F of ANOVA analysis which is a ratio of between group variability to within group variability.In the test, the null hypothesis (H 0 ) is the variance-covariance matrices of the groups are the same in the population.If the p-value (Sig.) of the test is less than 0.05, the null hypothesis is rejected at 5 per cent level of significance.As a result, the assumption of equal variance-covariance matrices is violated.This problem can be overcomed as follows.(i) The violation is not a problem if the violation is because of skewness but not by outliers (Tabachnick and Fidell, 1996).(ii) If the sample size is large than this violation cannot be a big problem and the validity of estimating the discriminant function can be checked by hit ratio of the holdout sample.

Multivariate Normality
An important assumption of estimating discriminant analysis is that all of the groups in the dependent variable are selected randomly from a multivariate normal population.To test this assumption, a variable by variable normality test of the variables can be conducted using histogram with normal curve, if the variables follow normal distribution then proceeding to estimate the function is justified on the multivariate normality ground.

No Multicollinearity
There should not be multicollinearity in the independent variables.The correlation matrix can be used to check the multicollinearity of the variables.In addition, the relationship of one independent variable with the rest of the independent variables could be checked by running regression of one independent variable on the rest of the independent variables and checking the R 2 .The multicollinearity problem is also can be solved by using stepwise discriminant analysis.If multicollinearity is found in the data, should be corrected.

Linearity
Unlike regression analysis, in discriminant analysis, the dependent variable is non-metric.Consequently, there is no linear relationship between dependent (non-metric) and independent (metric) variables.The linear relationship is required between the independent variables.To test this assumption, we will check the degree of relationship of one independent variable with another independent variable and we will check the degree of relationship of one independent variable with the rest of the independent variables.If one variable is consistently found non-linear with the other variables than only the variable should be attended.

Outliers
Discriminant analysis is very outlier sensitive.Hence outlier should be identified and excluded from the analysis.According to the SPSS Base 10.0 Applications Guide, page 259, "cases with large values of Mahalanobis distance from their group mean can be identified as outliers."In addition to Mahalanobis distances, the Box plots and histogram can be used to identify the outliers.The outliers should be excluded from the analysis.

Sample Size
Some researchers argue to have 20 sample sizes per predictor variable.But many times, that could not be achieved because of lack of data.However, Burns and Burns (2008) argued that the sample size in the smallest group of the dependent variable should be at least 5 times higher than the number of the predictor variables.Like regression analysis, discriminant analysis should be estimated based on a data set as large sample size as possible.Small sample size may produce wrong discriminant function.

Other Points to Remember
The observations must be random sample.The categories in the dependent variable should be well defined, categories should be well defined before data collection and all group cases must be mutually exclusive and collectively exhaustive.A contentious variable or a variable on which data is collected on a scale should not be divided into group just to use discriminant analysis.

Group Means
Group means and standard deviations for each variable for not stayed faculty (1) and stayed (2) faculty are calculated in table 1. Group mean provides an idea about whether the means of the variables differ between the groups.In addition, group means and group standard deviations can be used as characteristics profile for the two groups.The table 1 shows that the group means are different for the variables: age, salary, f-degree and result.In the section 4.2, the statistical significance of the mean differences is tested.

Tests of Equality of Group Means
The Wilk's lambda and the F ratio are used to test the equality of means of the groups for the same variable.The Wilk's lambda for the each predictor is equal to the ratio of the within group sum of squares to the total sum of squares.It is estimated from one way analysis of variance by considering status variable as independent variable and the predictor variable as dependent variable.The Wilk's lambda is also known as U statistic.The range of Wilk's lambda value is 0 to 1.If a variable's wilk's is less 0.95, it is revealed that the group means are significantly different.The larger the value, the smaller significance and the smaller the value, the larger significance is ensured.The Wilk's lambda and the transformation of its value to F is done as under and presented in the table 2.
Transformed value of Λ to an equivalent F is calculated as in MANOVA: , For n is equal in all groups in the dependent variable: , P is the number of independent variables.In table 4, rank 4 means the size of the covariance matrix, the 4 means that this is a (4x4) matrix, and the 4 is the number of independent variables in the discriminant function.The covariance metrics are same if the log determinants of not stayed group covariance matrix and the log determinant of stayed group covariance matrix are the same.Whether the log determinants are the same is tested by the Box's M. Box's M for this study is 30.37.The transformed F. ratio is computed in order to test the equality of the two covariance matrices.The F ratio is similar to ANOVA which is the ratio of between group variability to within group variability.In the test, the null hypothesis (H 0 ) is that the variance-covariance matrices of the groups are the same in the population.A P-value (sig) of 0.00 means that the null hypothesis is rejected; consequently, the assumption is violated.However, a value less than 0.05 do not automatically cancel the prospect of the estimation of discriminant function.Although the assumption is violated, most of the time, the discriminant function can be found valid at the time of validity check.This is surprisingly true for many cases.This happens if the violation of the assumption is because of skewness not for outliers.In case of a large sample size, the violation is not a problem in the model's high accurate forecasting rate.However, since the significance ratio (p-value) is very low it is justified to check the uni-variate normality of the variables.A uni-variate normality test of the variables show that some variables do not follow normal distribution.As our sample is very small, we do not drop any cases from the analysis for which the variables become not-normal.If we can drop the cases for which the variables do not follow normal distribution, we can have a better estimated discriminant function with higher hit ratio.This point should keep in mind in real life discriminant analysis with utmost importance.

Determine the Significance of the Discriminant Function
Function-1 in the table 5 means that one function is estimated for a two-group discriminant analysis.The eigen value means a ratio of between group sum of squares to within group sum of squares as equation 3. The larger value means the better estimation of the discriminant function.The minimum acceptable value of eigen value is 1.00.The higher the better.Another way to test the significance of the Eigen value is to check the canonical correlation of the function.The canonical correlation(r) for the estimated function is 0.72.Consequently, the coefficient of determination (=SSB/SST) of the function is (0.72 2 ) = 0.50 which means that 50% of variation in the group memberships in the dependent variable is explained by the estimated function.Furthermore, Wilk's lambda (=SSW/SST) is also used to check the significance of the estimated function.The Wilk's lambda for the estimated function is 0.482.The smaller the Wilk's lambda, the more significance of the estimated function is assumed.The transformed chi-square (( 2 = -[(n -1) -0.5 (m + P + 1)] ln , df= (k -1), m = number of discriminant function extracted, P = number of predictor variables) is 18.96 with 4 degrees of freedom and level of significance (Sig.) is 0.001.In the chi-square test, the null hypothesis (H 0 ) is the centroids of the categories and the grand centroid all are equal.So at 1% level of significance, the null hypothesis is rejected.Thus estimating the discriminant function and interpreting the results are statistically significant.

Structure Matrix
The coefficients of structure matrix (table 6) are known as factor loadings or canonical loadings.A factor loading represents the correlation between the variable and the estimated discriminant function.By squaring a factor loading, we can determine the variation the variable can explain in the dependent variable.The larger the coefficient, the more important in determining the group membership.Thus the absolute size of coefficient represents relative importance of the variable.The factor loadings are calculated by using equation-8 and equation-9 and presented in the table-6.The importance of a variable is calculated first to determine the relative importance of a variable in the discriminant function by using equation-8.The importance of a variable is equal to the weight or coefficient of the variable times the mean difference value of the two groups of the variable. Symbolically: (8) where, I i = importance of the variable i, W i = coefficient of the variable i, X = average value of variable i for group 1, X = average value of variable i for group 2, I = number of predictors: 1, 2, 3,---k.A variable's relative importance is equal to the variable's importance divided by the sum of all variables importance as equation-9.Symbolically, ∑ Where, R i = relative importance of the variable i.
Alternatively, the structure matrix can be calculated as equation-10: where, L is the loadings matrix, R w is the within groups correlation matrix, and D is the vector of standardized discriminant function coefficients.
The structure matrix, table 6, is arranged by showing the variables as an order of highest important in determining group memberships to lowest important in determine group memberships.In our study, result is the most important variable followed by foreign degree and salary.The lowest important variable is the age of the applicant.By squaring the coefficient of a variable, the variation in the dependent variable by a variable can be explained can be determined.For instance, result can explain 28 (= 0.526 2 ) per cent variation in the group memberships.In addition to structural matrix, standardized coefficients also are used to check the relative importance of the variables.By using the discriminator, the HR manager can estimate the expected position of an applicant applied for the faculty post.For the purpose, from the application document, the values of variables will have to be substitute in the equation-11.If the Z score for a candidate is negative then his expected position is not stayed (1) as the centroid for not stayed group is negative and the higher the distance between Z and 0, the higher possibility of leaving the university.If the Z score is positive then the expected position of the applicant is stayed (2) as the centroid for stayed group is positive and the larger the distance between Z and 0, the larger possibility that the applicant for the post will stay in the University for the long run.Thus management can select persons whose expected status is stayed for the long run in the University and can minimize the faculty turnover ratio.

Group Centroids
The centroid is the mean value of the discriminant scores for a particular group.There are as many centroids as there are groups, as there is one for each group.In our two group discriminant analysis, we have two centroids.
The discriminant scores for the analysis sample (Zs) are presented in the last column of the table-9.Symbolically, they are calculated as equation 12: The centroid for the not stayed group ( 1) is -1.001 and the centroid for the stayed (2) group is 1.001.In absolute term, the larger the Z value the better estimate is estimated.These centroids are used to determine the expected status of the new applicants for the faculty post.If a new applicant comes, the values of the variables from his application form or resume will be substituted in the estimated function and then the Z value for the applicant will be computed.Then, the Z score will be compared with the centroids for a decision.If the Z score for the new applicant is negative then the expected position is not stayed (1) as the centroid is negative for the not stayed (1) group and if the Z score for the new applicant is positive then the expected position is stayed (2) as the centroid is positive for the stayed (2) group.The cutting point for the decision making is zero as the average of the centroids of the groups are equal in absolute term.This is the case always if the sample sizes of the groups in dependent variable are equal and the analysis is a two-group discriminant analysis.If the sample sizes of the groups in the dependent variable are not equal, then the cutting point will be the weighted average of the centroids.The groups' centroids are presented in table 8.

Casewise Statistics
Table 9 presents an important summary of part of the analysis.In the table, the case number means the sample serial number in the study.Actual group means the cases on which actual data is collected and predicted group means the predicted group membership of the cases in the actual group.The double asterisk (**) in the predicted group means the miss-prediction by the estimated discriminant function.For instance, the case number 6 in the original analysis sample was actually belongs to group 1 but the estimated model forecasted the case number 6 as a member of the group 2. The probability of the highest group is the probability of being in the first predicted group.The group in the second highest group is the second highest possibility of being in a predicted group.Since our analysis is a two-group analysis, this group membership is the just alternative of the predicted group membership.For example, the model predicted the actual group case 1 as group 1 case , so the second highest group here will be definitely group 2 for the case one as our analysis is a two-group analysis.The second highest group probability is the possibility of being in the second highest group.The last column of the table presents Z score for the analysis sample cases.The average of the Z scores of the two groups: 1 and 2 are called centroid for group 1 and centroid for group 2 respectively.The lower part of the table represents the above statistics except Z for the cross-validated sample.In cross-validated sample, statistics for each case is estimated based on the function derived from all of the cases other than that case.
In addition, Mahalanobis (1893Mahalanobis ( -1972) ) distance (Mahalanobis, 1936) is a distance measure used to measure the distances between a case and the centroid.A new case will have the measures by which the case is compared with the centroids and is included in the group for which the distance is smallest.The distance calculation method takes into account-the variances in each direction are different and co-variance between the variables at the time of distance calculation.Mahalanobis distance is calculated as equation 13: (13) where, pooled estimate of the covariance matrix.
A case x is classified by using one Mahalanobis distance for each centroid to one of the G groups.A case will be included in a group for which the distance is minimum.Mahalanobis distance is measured in standard deviation (SD) unit.If the distance is more than 1.96 between a case and a centroid, then the probability is less than 5 per cent that the case will be in the group of the centroid.Figure 2 shows the pictorial presentation of the Z scores of the cases of the analysis sample presented in the last column of the table 9.The left hand side histogram presents the Z-scores of the not stayed (1) group.Table 9 and the histogram shows that the minimum Z is 0.068, maximum is -3.397, average is -1.001 and the standard deviation 1.115.Here, it is notable that the expected sign of Z score of a case from not stayed (1) is negative as the centroid for not stayed (1) is negative.The larger value of Z with negative the better.One case generated positive value-indicates the miss-classification of the case by the estimated model.So, 14 cases out of 15 of not stayed (1) group is correctly classified by the estimated model.
The right hand side histogram presents the Z scores of the stayed (2) group reported in the last column of table 9.
The table and the histogram show that the minimum Z score of the cases in stayed (2) group is -0.111, the maximum Z score is 2.736, average of the cases of stayed ( 2) is 1.001 and the standard deviation is 0.87.The larger value of the Z the better estimate is assumed as the centroid for the stayed (2) group is positive.One case of this group generated negative value, reports misclassification.Hence, 14 cases out of 15 of stayed (2) group are classified correctly.In total, 28 cases of 30 cases are classified correctly.Thus the accuracy rate or hit ratio of the estimated model for the analysis sample is 93 per cent.
Figure 2. Histogram of Z values of status 1 (not stayed) & status 2 (stayed) In addition to the above histogram, the Z scores of the cases of the analysis sample can be presented in the Box plots as figure 3. The Box plots are used to check the distribution of the Z scores and to find out the outliers if exist any.The Box plots show that the Z scores of the not stayed (1) group does not follow normal distribution and the Z scores of the stayed (2) group follows approximately normal distribution.In spite of this violation of rule, the discriminant function is found significant by significantly higher hit ratios.The hit ratio can be higher if we could exclude outliers and the cases for which the variables become not-normal.We could not do that because of limitation of very small size.In order to check the validity of the model and to know the accurate forecasting power of the estimated model, at this stage, will look at the classification/prediction/confusion matrix.The matrix is constructed based on the prediction of the analysis sample by the estimated model and presented by the table 10.The primal diagonal of the matrix presents the accuracy rate oh the model and the off diagonal of the matrix presents the misclassification rate of the estimated model.The first element of the primal diagonal presents the rate-a case is actually in group 1 and the estimated model forecasted as group 1 divided by the number of the cases in group 1 and the rate is 93 percent.The second element of the primal diagonal presents the rate-a case is actually in group 2 and the estimated model forecasted the case as group 2 divided by the number of cases in group 2 and the rate is 93 per cent.The total of the primal diagonal divided by the total number of the cases in the analysis sample is equal to correct prediction rate which is also known as hit ratio.In addition, the off-diagonal presents the misclassification rate of the estimated model.The first element of the off-diagonal matrix presents the rate-a case actually is in group 1 but the model forecasted as group 2 divided by number of cases in group 1 and the rate is 7 per cent.The second element of the off diagonal element present the rate-a case actually is in group 2 but the model forecasted as group 1 divided by the number of cases in group 2 and the rate is 7 per cent.In aggregate, the accuracy rate of the model is 93 per cent.
The classification matrix of the original sample may be biased because the model is estimated by including the case for which expected status is estimated.So, cross-validation classification matrix is prepared and it is better to compare the accuracy rate of the cross-validation to the standard accuracy rate to check the validity of the model.In the cross-validation analysis, the case for which expected status is estimated is not included in the analysis sample to estimate the discriminant function.Thus, the process is continued as many times as many cases in the analysis sample.To the end of the analysis, the cross-validation matrix is constructed.The aggregate accuracy rate of the model is 77 per cent which is quiet high compared to the minimum standard rate 70 per cent.The accuracy rate of the estimated function must be compared with the standard accuracy rate set by the statisticians and the scholars in the field to justify the logic of estimating and using the discriminant function (Uddin 2013).Furthermore, some researchers argued that the accuracy rate of the estimated function should be compared with the probability of selecting right employee, if the employee is selected randomly from the analysis sample.If the groups are equal in size of the employee than the probability of selecting expected employee is 1/number of groups in the analysis.In our study, number of groups is 2 and the number of not stayed and stayed faculty in each group is equal.In details, in the analysis sample 15 are not stayed faculty & 15 are stayed faculty and in the hold out sample 6 are not stayed faculty and 6 are stayed faculty.So if randomly selected, the possibility of selecting an expected employee from the analysis sample is 0.5.Joseph, William, Barry, & Ralph, (2010) and Glen (2001) argued that accuracy rate of 0.25 higher than the random chance is justified to estimate a discriminant function.In addition, Boyd, Westfall & Stasch (2005) argued that more than 70 per cent accuracy is acceptable to estimate and to use discriminant function in decision making.In this case, the minimum accuracy rate is 77 per cent, which ensures the validity of the model.If both of the groups are equal in size, a t test can be conducted to test whether the hit ratio is higher than the chance as used by Altman (1968) like equation ( 15).where, the null hypothesis (H 0 ) is the models hit ratio is not higher than the chance ratio, P is equal to the proportion of the correctly predicted cases and the degree of freedom (df) is total sample size (N) minus 2: N-2.The t values for original analysis sample (t = 4.71), cross-validated (t = 2.96), and holdout sample (t = 2.29) are statistically significant.So, discriminant analysis produced significant higher accuracy in forecasting expected status of the faculty post candidate.Press's Q (Press & Wilson, 1978) is also used to compare the accuracy rate in the classification by the estimated function with the random chance rate.In order to estimation and use the discriminant function, Press's Q should be extremely high compared to Chi Squared value at 1 degree of freedom which is 6.63.However, decision maker should be very careful when using Press's Q ratio for decision making if the sample size is small.The decision making will be misleading, if the sample size is small.The Press's Q ratio is not large enough (5.33) compared with table value for the holdout sample of this study although the hit ratio (83%) is significant.So, the estimation of the discriminant analysis and the use of the function in faculty selection are not justified using Press's Q ratio.But estimation and using discriminant function is justified using hit ratio and chance criterion.This problem will not exist if large sample is used as the holdout sample and the analysis sample in the study.Assume, N = sample Size, n = number of observations classified correctly, K= number of groups in the dependent variable.Hence, Press's Q is defined as equation-16.The overlap area is very small hence estimating and using discriminant analysis are justified.By including more variables, and larger sample size,a better function can be estimated and used in decision making.

Casewise Plots of the Predictors
The casewise histograms of the predictors can be drawn to check the normality of the variable.The results show that all of the variables are approximately normally distributed.In addition, descriptive statistics-mean, median, mode, skewness and kurtosis is used to check the uni-variate normality of the variables.For a normally distributed variable, mean = median = mode and skewness = kurtosis = 0.If the variables are found to follow uni-variate normal, the estimation is worthwhile.In this case, some variables are found not to follow uni-variate normal distribution.We did not skip the cases liable for this problem because of limited sample size problem.
4.6.8Eigen Value, Canonical Correlation, and Wilk's Lambda Eigen value is an important criterion to predict validity of the model.Theoretically, eigen value is a ratio of between group sum of squares to within group sum of squares.The eigen value 0 means that the discriminant function has no discriminatory power.The higher eigen value means the higher accurate forecasting power of the discriminant function.The minimum acceptable Eigen value is 1.00.In this research the Eigen value is 1.073, the canonical correlation-r is 0.72 and the Wilk's lambda is 0.482.All of the statistics ensure reasonable acceptance of the estimated function.To recapitulate, by increasing sample size and more variables, HR manager can increase the validity of the function.

Conclusion
This study estimates a two group discriminant function in order to determine efficient faculty who will stay in the organization for the long run and consequently to reduce human resource management costs: recruitments, training & development, and managing a human resource costs etc.The function is significant at a one per cent level of significance and the function can explain 50 per cent of the variations in the dependent variable.A better function can be estimated by using more variables and larger sample size in the analysis sample.For example, the variables-like job average (average job duration in the last there jobs), gender, race, and time variable included in the study of Walker (1998) and socio economic standing variable suggested by Reiss (1961) can be included in the study.For the purpose, head of the human research department can take responsibility of the functions.Thus, productivity of the human research management could be increased.
For unequal n in groups in dependent vraiable, df N k Total sample size no. of categories.

Figure 3 .
Figure 3. Box plots illustrating the distribution of discriminant scores for the two groups 's Q for Uneven Sample in Groups of Discriminant Scores Another available tool in hand to check goodness of fit of the model is histogram of the discriminant scores of the analysis sample and holdout sample.Figure-2 shows the histogram of the discriminant scores of the analysis sample.

Table 1 .
Group statistics

Table 2 .
Tests of equality of group means

Table - 2
shows that all of the means of the variables have very significant differences between the groups.The lowest Wilk's lambda presents highest importance in the discriminant function.Hence, the most important variable in discriminant function is result and the lowest important variable is the age of the faculty.The result is exactly supported by the p-value of the F-test.

Test of Equality of Covariance Matrices by Using Box's M:
In order to estimate a valid discriminant function, an important assumption is that the variance-covariance matrices of the groups should be the same.The variance-covariance matrices are presented in table 3. The pooled within group is computed by taking the average of the variance-covariance matrices of the groups.An overview of the variance-covariance matrices shows that the variance-variance matrices are substantially different.Now, we will test statistically in the table-4 whether the variance-covariance matrices are same or different.

Table 4 .
Test of equality of covariance matrices by using box's M

Table 5 .
Determine the significance of the discriminant function

Table 7 .
shows the relative importance of the variables in the study.The main objective of this study is to estimate a discriminant function to estimate the expected position of the faculty post candidate in a private university in Bangladesh.Table-7 shows the coefficients of the variables in the unstandardized discriminant function.The coefficients are the multiplier of the variables when they are in original measurement units.By using the variables and the coefficients, the required discriminant equation 1 is in the form equation-11.The equation is often known as discriminator.Canonical discriminant function coefficients (unstandardized coefficients)

Table 8 .
Functions at group centroids Note: Unstandardized canonical discriminant functions evaluated at group means.

Table 10 .
Classification results (b, c) Cross validation is done only for those cases in the analysis.In cross validation, each case is classified by the functions derived from all cases other than that case.b. 93% of original grouped cases correctly classified.c.77% of cross-validated grouped cases correctly classified.4.6.2(a)Classification Matrix of the Holdout SampleThe holdout sample is used to check the validity of the sample furthermore.By substituting the values of the cases of the holdout sample in the estimated model, the Z scores are calculated for the cases.Based on the Z scores and the centroids, the expected status of the cases are estimated and table-11 is constructed.The table shows that 83 per cent of the cases are correctly classified.In compare to the standard, this hit ratio is very high.Hence, estimating & using the discriminant function in human resource selection is justified.

Table 11 .
Classification results-holdout sampleBy substituting the values of the cases of the holdout sample in the estimated function, the casewise Z values are computed as shown in the table 12 for the cases.In the table, the double asterisk (**) means misclassified cases by the estimated discriminant model.It is notable that 5 cases out of 6 cases of group 1 are correctly classified and 5 cases out of 6 cases of group 2 are correctly classified.Thus, the estimates in the table 12 shows that 83 per cent of cases are correctly classified.
The value of is large enough for the original analysis sample (= 86.67%), the cross-validated sample (= 53.33%) and the holdout sample (= 66.67%), those values prove the substantial improvement in right faculty selection by using the estimated function.Assume: n c = total number correctly classified, p g = prior probability of membership for group g, N g = number of cases for group g, N = total sample.Now, tau ( ) is defined as equation-14.
4.6.3-Test to Compare Model's Classification Rate and Random Classification RateA test statistic, tau ( ), can be computed to check the acceptability of the overall classification of the model.The statistic generates a number which can be explained as 'fewer errors compare to the random classification'.