Long-term Examination of Bank Crashes Using Panel Logistic Regression: Turkish Banks Failure Case

Crises in the financial sector over the last two decades have shown the importance of early warning systems, especially for bank failures. This study aims to develop an early warning system for Turkish commercial bank failures using panel data from 2002 to 2012. The data was analyzed using pooled logistic regression versus random panel logistic regression. The dependent variable was the bank failure, defined as the return-on assets ratio. Factor analysis was used to construct independent variables of financial ratios. The meaningful factors were found as: Interest income and expenditures, Equity, Other income and expenditures, Balance sheet, Deposit, Due, Asset quality. When the focus is sensitivity, the best prediction performance was obtained using random-effect logistic regression.


Introduction
At the time of the Asian crisis, the Turkish bank system was damaged seriously. In 2007, a new economic crisis had started in United States that affected the whole world, and it was not known when it was going to end. Now, is not clear what banks will survive and continue and what banks will not.
There are several types of analysis that aim to discriminate between healthy and unhealthy banks. These studies were pioneered by Beaver (1966) for the univariate case and by Altman (1968) for the multivariate case. While Altman developed his model for manufacturing entities, Sinkey's (1975) model aimed to predict bank failure. For a fuller review of the literature, see Balcean and Ooghe (2006), Bellovary et al. (2007), Demyanyk and Hasan (2010), and Gepp and Kumar (2012).
There are some cross-sectional bank-failure studies that focus on the period before the Asian crisis, mainly between 1997 and 2001. These studies include the work done by Canbas et al. (2005), Celik and Karatepe (2007), Erdogan (2008), Boyacioglu et al. (2009), Erdogan (2012, and Inan and Erdogan (2013). There are very few panel data applications for bank bankruptcy related to the Turkish bank system. Büyükşalvarcı and Abdioglu (2013) undertook a panel data analysis to investigate the determinants of Turkish banks' capital-adequacy ratio. The results of their study show that loans, return on equity, and leverage have negative effects on the capital-adequacy ratio, while loan loss reserve and return on assets have positive effects. Ilk et al. (2013) investigated bank failures in Turkey using longitudinal data structures. Using principal component analysis to avoid multicollinearity; they used three statistical models: logistic regression, generalized estimating equations, and marginalized transition models. All three models seemed to perform equally well according to the overall correct classification rate.
The objective of this study is to implement a panel data analysis using pooled logistic regression and random logistic regression on the financial ratios of Turkish commercial banks from 2000 to 2012. During the first period, that is the period before 2001, the Turkish Saving Deposit Insurance Fund (SDIF) took over all unsuccessful Turkish banks. Thus, all studies about Turkish commercial banks during this period defined bank failure as a bank that was taken over by the SDIF. During the second period, that is the period after 2001, there were not enough banks taken over by the SDIF to use the definition of failure as "being taken over by SDIF". Instead, this study uses a profit measure called Return on Assets (ROA) defined as the proportion of net profit after taxes over assets. This study is therefore distinguished from previous studies both in terms of time and failure definitions.
Section 1 reviews the literature. Section 2 introduces the theoretical aspects of the models used in the study. Section 3 discusses the data preparation operations, data sampling, and the selection of financial ratios. Section 4 discusses the results. Finally, Section 5 provides conclusions.

Panel Data Structures
This study considers a panel data structure. The data was taken from the website of the Turkish Bank Association. With regression data, a cross-section of subjects for a specific time are collected. In contrast, with time series data, subjects are observed over time. Longitudinal/panel data include both regression and time series data. As with regression, a cross-section of subjects is collected and each subject is observed over time.
Panel data has benefits and drawbacks. Some advantages of using longitudinal data include studying dynamic relationships, studying heterogeneity, reducing the omitted-variable bias, and having more efficient estimators (Frees, 2004). However, in terms of drawbacks of panel data, a selection bias may occur if simple random sampling is not used to select observational units. And because the same subjects are followed over time, nonresponses typically increase over time.

Panel Data Model
In a panel data structure, observations are indexed with both and t (Baltagi, 1995).
Here i represents cross-section data such as households, individuals, firms, or countries, and t represent the time series. Some variables vary only across individuals or across time, and some vary with both individuals and time.
One key feature of the model is that we allow each individual i to have a distinct intercept 0 . This intercept includes all aspects of unobserved heterogeneity that are fixed over the length of the panel.
A different panel data model can be given as below.
includes all fixed, omitted variables.

Panel Logistic Regression Analysis
In some economic studies, the dependent variable is discrete, indicating, for example, that an individual defaulted on a loan or was denied credit, or that a bank is successful or not. This dependent variable is usually represented by a binary-choice variable = 1 if the event happens and 0 if it does not for individual i at time t (Baltagi 1995). The probability that the event happens for individual i at time t, , is usually modeled as a function of some explanatory variables.
Generally, a logistic cumulative distribution function is used that constrains ( ′ ) to between zero and one.
For the panel data structure, when the dependent variable is binary, the first choice for constructing a discriminating model is the pooled logistic regression. With this method, the panel structure of the data is ignored.
A Fixed Effects (FE) estimator can be used with either the distinct intercepts or the error components if the heterogeneity term is correlated to explanatory variables. Here the basic idea is to estimate a separate intercept for each individual. The basic idea for the Fixed Effects estimator is to cancel out by differencing observations for the same individual. It is possible to use all the observations for each individual if the individual's specific mean is subtracted from each observation.
Ordinary least square is consistent in the uncorrelated version of the error components. In the presence of serial correlation, it is necessary to correct for serial correlation between observations of the same individual. If the heterogeneity term is independent of the explanatory variables, the Random Effects (RE) method is used to overcome the serial correlation of panel data.
Logistic regression parameter estimators are usually found using the Maximum Likelihood Estimator (MLE) method. The maximum likelihood equation is constructed using the joint probability distribution and expresses the values of β in terms of known values for y. Chamberlain (1980) suggests maximizing the conditional likelihood function to obtain the conditional logit estimates for .
To test for fixed individual effects, one can perform a Hausmann-type test based on the difference between Chamberlain's conditional MLE and the usual logit MLE ignoring the individual effects.
The Hausmann test (Amini, 2012) states in null hypothesis that the difference in coefficients is not systematic, that is, random effect is appropriate. If the null hypothesis is rejected, it is accepted that the fixed model is appropriate and there is no need to consider random effect.

Datasets
The sample of banks and the data used to train and test the models were taken from the website of The Banks Association of Turkey (BAT) (http://www.tbb.org.tr/english/). This site serves bankers, researchers, and investors and offers banks' financial tables as well as an overall evaluation of the performance of the Turkish economy and the banking sector.
This study includes data from 22 banks with commercial banking activities from 2001 to 2011 (Appendix A). After the Asian crisis, new regulations were made in the Turkish bank system, and Turkish banks were mostly saved the fate of being taken over by SDIF. Although some of the banks are rumored to be unhealthy, SDIF did not take over any of them. It is therefore necessary to create a new definition for bank failure. Return on Assets (ROA) may be used as a proxy for profitability, and profitability can be used to describe failure.
The ROA provides information about how much profit is generated on average by each unit of assets. ROA is therefore an indicator of how efficiently a bank is being run.

=
As a result, this study uses ROA to determine the health status of a bank; that is, ROA is used to determine dependent variable values. For this reason, this study examines the ROA ratios for the period from 1988 to 2000. The average ROA ratio for banks that were taken over by SDIF during this period is less than one for every year. Accordingly, if the ROA is less than one, the bank is classified as unsuccessful and coded as one. Otherwise, if the ROA is greater than one, the bank is classified as successful and coded as zero. The ROA is computed for each bank for each year.
Panel data from 2001 through 2011 were used for training, and the 2012 data set was used for testing. In this manner, the models, which are constructed from the previous year's panel data, are tested on the new data set, which is derived from the future and used for an ex-ante prediction.

Implementation
The variable set of this study includes the financial ratios of 22 Turkish commercial banks. Some of these ratios measure liquidity and some measure profitability, efficiency, solvency, and leverage. Appendix B contains a list of the financial ratios.
Because there are many ratios in financial tables that indicate the success levels of banks, and because some of the ratios are highly correlated with each other, this study uses a factor analysis to construct independent factors that affect the success status of the banks. Stata 12 was used as an analytical tool to perform the research in detail. Factor analysis was performed using IBM SPSS Statistics 21. Because there were some financial ratios with very high correlations, those with higher than 0.90 correlations were excluded before factor analysis was performed. Eighteen financial ratios were found appropriate for factor analysis.
Because the data is in panel form, to check whether classical factor analysis is appropriate for pooled data, the Levin-Lin-Chu unit root test was used as a stationarity test for each ratio before factor analysis. All ratios are stationary, that is, they do not contain unit roots.
For the factor analysis standard SPSS Dimension Reduction routine was used. According to the eigenvalue criteria, there were seven meaningful factors. Factors with an eigenvalue greater than one were treated as meaningful. The meaningful factors are: Interest income and expenditures, Equity, Other income and expenditures, Balance sheet, Deposit, Due, Asset quality. Then, the scores of these new factors were used as independent variables for panel logistic regression analysis.
As discussed in Section 2, with pooled panel logistic regression, the panel structure of data is ignored. When the panel structure is considered, there is a choice between fixed-effect logistic regression and random-effect logistic regression.
To make this decision, a Hausmann test was performed. In the null hypothesis, a Hausmann test states that if the difference in coefficients is not systematic, random-effect regression is appropriate. The p-value (Prob>chi2) was found to be 0.6300, meaning do not reject the null hypothesis. This result means that the random-effect model is appropriate.
When the logistic regression routines were run, the small p-value, <0.00001, led to the conclusion that at least one of the regression coefficients in the model was not equal to zero, meaning that the models were meaningful.
To assess the models, each model was trained on both the training and test data sets. The models were evaluated using a confusion matrix. The confusion matrix (Kumar, 2005) thus represents model performance. Accuracy is defined as the number of correct predictions divided by the number of observations. Error, which is usually the most important metric for model performance, is defined as the number of mistakes divided by the number of observations. Sensitivity, another intuitive metric for model performance, is defined as TP divided by the sum of all positives. Because sensitivity determines how well a model finds positive instances, in some situations, like failed banks, sensitivity is more important than accuracy. Specificity indicates how well a model can find negative instances; it is defined as TN divided by the sum of all negatives. When evaluating the results of a classification procedure, the measure on this study focuses depends on the issue at hand.

Results
Because of the impact on he economy, it is more important to diagnose a bank that may be at risk for bankruptcy than to diagnose a healthy bank. Therefore, the focus of this study is sensitivity.
The overall accuracy, sensitivity and specificity metrics were obtained for all trained models and are shown in Table 1.  Vol. 5, No. 3;2016 With regard to sensitivity, it is interesting that when the rate for pooled panel logistic regression decreases from 81% to 60%, there is an increase in the random-effect logistic regression from 96% to 100%. Because this study attempted to make an ex-ante prediction using a model based on data from the period prior to the time of bank failure, it is reasonable to find a lower sensitivity rate for the test set. The result may be due to the chance.

Conclusions
In this study, pooled panel logistic regression and random-effect logistic regression were implemented to analyze bank financial ratios. Data sets belonging to Turkish commercial banks were used.
According to the results, both pooled panel logistic regression and random-effect logistic regression are capable of extracting useful information from financial data. But when the main goal is to find positive instances, that is, failed banks, it is more appropriate to focus on sensitivity than accuracy.
When the focus is sensitivity, the best prediction performance was obtained under consideration of the panel structure of the data, using random-effect logistic regression. Therefore, using a longitudinal data structure and implementing a random-effect logistic regression can give useful information about the failure status of banks and can be used as part of an early-warning system.