Predicting Financial Failure Using Decision Tree Algorithms : An Empirical Test on the Manufacturing Industry at Borsa Istanbul

This study aims to develop a model using C5.0 and CHAID decision tree algorithms to estimate the financial failure and/or success of a given manufacturing company. Within the scope of this study, 35 financial ratios are used as independent variables calculated on the grounds of both company’s annual financial statements and notes from 2007 to 2013. The dependent variable is the successful or unsuccessful status in terms of financial capability of 206 manufacturing firms listed on the Borsa Istanbul. Qualitative criteria are used to categorize the companies as successful or unsuccessful. The rates of accurate classification for both models are found to be at acceptable levels. Although the CHAID algorithm’s general rate of accuracy and its rate for successful companies are greater than the rates obtained from the C5.0 algorithm for the same observations, the CHAID algorithm yielded much lower results than the C5.0 algorithm in predicting unsuccessful companies.


Introduction
Due to its socio-economic impact, the topic of company failure has attracted the attention of researchers and led to multiple studies on the factors influencing the financial failure and/or success of companies.Recent bankruptcies and financial crises have also kept the topic on the agenda.Taking into consideration the increasing complexity of financial instruments, the increasing number of issuers, and securitization and globalization; it is obvious that there will be an increase in the parties potentially affected by company failures.Accordingly, studies on the prediction of company failure will continue to attract interest.
On the other hand, some studies have used different measures.For example, Beaver (1966) and Edminister (1972) accepted nonperforming financial liabilities in due time, unpaid debts apart from those leading to bankruptcy, bounced checks, not distributing profit to privileged shareholders and so on as measures for financial failure.
Financial failure is defined in the following ways: the appointment of an equity receiver apart from an application the right to bankruptcy or reorganization as per the bankruptcy code (Altman, 1968); the decrease of a company's assets between two defined time periods (Wilcox, 1970); the inability to pay debts due or to make a deal with creditors in order to reduce debts, thus, entering into bankruptcy (Blum, 1974); entering into liquidation as demanded by creditors (Deakin, 1976); entering into liquidation as demanded by creditors and the suspension of trading by court order (Taffler, 1982); incurring loss for three years or the termination of production due to financial crisis (Aktaş, 1993); incurring loss for two years or possessing net book value per share that is under net asset value per share, incurring a loss for one year and also possessing equity below the value of capital issued, and concern stated within independent audit report on the continuity of the company according to a special definition for China made by Altman et al. (2007); incurring loss for two years and share movement lower than that of the general index in which the share is traded (Özdemir, 2011).
This study, which uses qualitative indicators to identify unsuccessful companies, aims to classify companies that operate in the manufacturing industry as successful or unsuccessful according to their financial ratios.
Discriminant analysis, logistic regression, artificial neural networks, principal component analysis and decision trees are commonly used in the literature.This study uses decision tree algorithms such as artificial neural networks, which are one of the new-generation data mining methods commonly used in recent classification and estimation studies on the grounds that their tree-like structure makes them easy to interpret (Koyuncugil, 2007;Koyuncugil & Özgülbaş, 2008).
Decision trees apply multiple tests called decision tree algorithms to a data set when determining the way that will best predict the dependent variable.This study utilizes the C5.0 algorithm developed by John Ross Quinlan and the chi-squared automatic interaction detector (CHAID) algorithm developed by Gordon V. Kass (Bounsaythip & Rinta-Runsala, 2001;Emel & Taşkın, 2005).
This study uses the annual financial reports and disclosures of 206 manufacturing companies listed in Borsa Istanbul for the period from 2007 to 2013.Upon reviewing financial failure studies conducted in Turkey, it is clear that no prior study has used the same definition of financial failure, data, methodology and the period as this study.
This study is unique in that it reviews and analyzes data related to news announcements and disclosures on the Borsa Istanbul website for the period prior to 2009 and Public Disclosure Platform (PDP) website belonging to 206 companies for the period from 2010 and onward.This study also looks at announcements made by Borsa Istanbul and PDP concerning market changes, companies delisted due to financial distress, companies transferred from the National Market to the Second National Market or the Watchlist Companies Market due to financial distress, companies obliged to make monthly declarations due to financial distress, companies warned by the Capital Markets Board of Turkey (CMB) or Borsa Istanbul to take precautionary measures due to loss of capital, and companies whose governing bodies applied to the courts.
The second section of our study summarizes the literature on measures of financial failure and methods used in previous studies.The third section explains the data set and method used in our study, and the fourth section analyzes the data obtained.The final section is the conclusion.

Indicators of Financial Failure
Both quantitative and qualitative indicators may be used to determine financial failure.Numeric indicators may be either book-value based or market-value based (Özdemir, 2011).On the other hand, the homogeneity of the companies present in a category decreases depending on the indicator chosen and category size.
Market-value-based indicators may achieve accurate results when markets are rather efficient and data is accessible.On the other hand, it may be argued that market-value-based indicators are inappropriate for cases like Turkey in which markets are inefficient.Therefore, using qualitative data to classify companies may prove less challenging.
This study defines a given company as unsuccessful provided that it is subject to any of the precautionary measures listed below.Its status as unsuccessful proceeds from the beginning of the year in which the measure was applied until the year it was lifted, if such were the case.The measures are as follows: a) Delisting from market due to financial distress; b) The suspension of trading due to financial distress; c) Change of market due to financial distress (i.e.transfer from the National Market to the Watchlist Companies Market or the Second National Market); d) Obligation to make monthly declarations due to financial distress; e) Notice to take precautionary measures by CMB or Borsa Istanbul due to capital loss, or application to court by the firm's authorized bodies.

Predicting Financial Failure and Methods Used
There has been a noticeable inadequacy of hypotheses regarding the causes of financial failure (Aktaş, 1997;Foster, 1986).It has been suggested that this inadequacy stems from the following causes: 1) Uncertainty when determining the variables that will be used to predict financial failure; 2) Uncertainty regarding the choice of linear or nonlinear model; 3) Uncertainty in determining the weights that will be given to variables.
Despite this inadequacy of hypotheses, explanatory and predictive models have been developed to solve the above problems.
Table 2 and 3 summarize several studies within the literature on determining financial failure and their methods.

Data Set and Method
This study uses the annual financial statements and notes of companies in the manufacturing industry.These statements and notes pertain to the period ranging from 2007 to 2013, and are prepared according to International Financial Reporting Standards (IFRS).The sample consists of the 206 manufacturers listed in the Borsa Istanbul Equity Market.These publicly disclosed financial reports and explanations (balance sheets, income statements, cash flow statements and notes) are obtained from the Borsa Istanbul website (for the period prior to 2009), from the PDP website (from 2010 onward) and by using an analysis program provided by the Financial Information News Network (FINNET).
This study calculates 35 financial ratios in 5 groups that may have an effect on financial distress.When determining the financial ratios serving as independent variables, the prominent ratios within the literature were taken into consideration.As IFRS were in effect during the period under examination, it was possible to access data not considered by previous studies such as cash flows resulting from operations/investments and foreign exchange positions.Table 4 lists the financial ratios used in this study along with their definitions.This study reviews the news and disclosures found on the Borsa Istanbul and PDP websites relating to the 206 companies selected, market transfers made by Borsa Istanbul and announced on the PDP website, suspension-of-trading announcements on the Borsa Istanbul website.This study defines a company as financially distressed in any of the following circumstances: delisting due to financial distress, market transfer and obligation to make monthly declarations due to financial distress, notice from CMB or Borsa Istanbul to take precautionary measures due to capital loss (Note 2) or application to the courts by authorized bodies of the company.Their status as financially distressed starts at the beginning of the year when the precautionary measure was taken until the time when the precaution was lifted, if such were the case.
The dependent variable is whether or not a company's status is that of financial distress.There are two possible categories for the independent variable: the value 0 is given to financially distressed companies and the value 1 is given to companies that are not financially distressed.
Some data consisting of outliers/extreme values or having some missing values were eliminated from the study.Additionally, the financial ratios of some companies are different, and in order to eliminate their negative effects on the model, all variables are normalized by subtracting the mean and dividing into one standard deviation.As a result, 88.5% (1,149) of the observations are considered successful, while 11.5% (149) of the observations are considered to be unsuccessful companies.
Studies and statistical interpretations have found that the greatest danger in performing classification studies that are conducted with data from unbalanced successful and unsuccessful observation numbers is that the classification success rate is higher for the group with the higher observation number, while the success rate is lower for the other group.According to the pre-analysis, this danger exists for our original data set as the number of successful and unsuccessful companies are proportionally unbalanced.It is thought to be beneficial to run a study with a data set in which the number of successful and unsuccessful companies are proportionally balanced.
In our study, SPSS Clementine 11 software is used to define a data set that will serve as a representative for all observations.All of the unsuccessful companies are used and 14% of the successful companies are selected randomly to form a subsample consisting of 149 unsuccessful and 157 successful companies.Therefore, balance is ensured and multiple successful companies are preserved within the data set in order to avoiding losing information obtained from the analysis.As a result, a subsample of 306 observations is taken as a basis to form the model.Table 5 gives the descriptive statistics of the variables used.The study applies the C5.0 decision tree algorithm to 266 observations (122 unsuccessful and 144 successful).
The process is continued until 10 observations remain on each leaf.Figure 1 displays the schema, and Appendix-1 states the cycle and rule steps of schema.Examining the model, it can be seen that classification is made using variables KSA3, SY2, L9, SY4, L1, BK5 and KSE5.In other words, a company's status as successful or unsuccessful can be interpreted by values obtained from the following ratios: profit before tax / net sales, leverage ratio (total liabilities / total assets), working capital turnover rate (net sales / current assets), equity structure, current ratio (current assets / short-term liabilities), cash flows from operations / total liabilities and EBITDA I / total assets.Decision trees can be used to predict whether a company will be successful or unsuccessful.Accordingly, this feature of decision trees can be used when rating the companies (Öcal, 2014).
From the decision tree shown in Figure 1, it can be seen that 18 different company profiles emerge as a result of the classification of observations within the data set.Table 6 summarizes the company profiles.Profiles 1 and 2 are determined by variable KSA3.Variable KSA3 forms the first profile for observations smaller than or equal to -0.002 and forms the second profile for observations greater than -0.002.The first profile constitutes 46 observations of which 45 (97.8%) are unsuccessful companies.Therefore, companies whose variable KSA3 is under or equal -0.002 can be evaluated as unsuccessful.The second profile is comprised of 220 observations, of which 77 (35%) are unsuccessful and 143 (65%) are successful companies.Additional data is required to designate the companies whose variable KSA3 is greater than -0.002 as successful or unsuccessful.
Profiles 3 and 4 are determined by variables KSA3 and SY2.Profile 3 is comprised of companies in Profile 2 (with a KSA3 variable greater than -0.002) of which the SY2 variable is greater than 0.551.Within Profile 3, there are 26 observations, of which 25 (96.15%) are unsuccessful.Therefore, companies with a KSA3 variable greater than -0.002 and an SY2 variable greater than 0.551 can be classified as unsuccessful.Profile 4 is comprised of companies within Profile 2 (with a KSA3 variable greater than -0.002) with an SY2 variable smaller than or equal to 0.551.In Profile 4, there are 194 observations, of which 52 are unsuccessful and 142 are successful.Additional data is required to classify companies with a KSA3 variable greater than -0.002 and same a SY2 variable smaller than or equal to 0.551 as successful or unsuccessful.
The following results are obtained when the data tree is interpreted as a whole: a) Companies with a KSA3 variable less than or equal to -0.002 can be classified as unsuccessful (Node 1).
b) Companies with a KSA3 variable greater than -0.002 and an SY2 variable greater than 0.551 can be classified as unsuccessful (Node 18).c) Companies with a KSA3 variable greater than -0.002, an SY2 variable less than or equal to 0.551, and an L9 variable less than or equal to -0.769 can be classified as unsuccessful (Node 5).According to the results, the model's classification accurate is 90.97% for the training set and 87.5% for the testing set.Thus, the classifications made by the C5.0 algorithm can be considered successful.

Results Obtained from CHAID Decision Tree Algorithm and Interpretation
As with the C5.0 algorithm, 266 observations are used in the formation of CHAID algorithm model (122 unsuccessful and 144 successful).The process is continued until 10 observations remain on each leaf and 3 classifications emerge (see Figure 3) (Note 4).Appendix-2 lists the cycle and rule steps of the schema obtained.
It can be seen that classifications are made by variables L1, BK1, SY3, KSE5, L10, SY2, BK5, KSE2 and L2.In other words, a company's status as successful or unsuccessful can be interpreted by values obtained from the following ratios: current ratio, interest coverage, tangible fixed assets (net) / total equity, EBITDA I / total assets, assets turnover rate, leverage ratio, cash flows from operations / total liabilities, profit after tax / total equity, and liquidity.
In the decision tree shown above, 21 different company profiles emerge from the classification of the data set.Table 9 summarizes the features of each profile.Profiles 6 and 7 are determined by variables L1 and SY3.Companies with an L1 variable ranging from -0.556 to -0.428 and an SY3 variable equal to or less than 0.047 constitute Profile 6.Companies with an SY3 variable greater than 0.047 constitute Profile 7. Profile 6 is comprised of 12 unsuccessful companies.According to the model, companies with an L1 variable ranging from -0.556 to -0.428 and an SY3 variable equal or less than 0.047 are categorized as unsuccessful.Profile 7 is comprised of 41 observations total, of which 20 (48.78%) are unsuccessful and 21 (51.22%) are successful.At the same time, it is not possible to interpret a company's status as successful or unsuccessful by using the model if its L1 variable is between -0.556 and -0.428 and its SY3 variable is greater than 0.047.Hence, the model cannot determine an additional variable to facilitate the classification of firms in this group.
Profiles 13, 14 and 15 are determined by variables L1, KSE5 and SY2.Companies with an L1 variable greater than -0.428, a KSE5 variable greater than -0.698, and an SY2 variable equal or less than 0.315 constitute Profile 13.Those companies with an SY2 variable ranging from 0.315 to 0.467 constitute Profile 14, and those companies with an SY2 variable greater than 0.467 constitute Profile 15.Profile 13 contains 109 observations, of which 6 (5.51%) are unsuccessful and 103 (94.50%) are successful.According to the model, companies are considered successful when they possess an L1 variable greater than -0.428, a KSE5 variable greater than -0.698, and an SY2 variable equal to or less than 0.315.The program continued classifying using variables BK5, KSE2 and L2.Although a group of 100% successful companies is obtained at the end of the classification process, it is thought that next part of decision tree has no important benefit to the model.
The following results are obtained when the data tree is interpreted as a whole: a) Companies with an L1 variable less than or equal to -0.556 and a BK1 variable less than 0.030 can be considered unsuccessful (Node 2).b) Companies with an L1 variable equal to or less than -0.556 and a BK1 variable greater than 0.030 are highly likely to be unsuccessful.However, the possibility of them being successful should not be ignored (Node 3).c) Companies with an L1 variable ranging from -0.556 to -0.428, and an SY3 variable equal to or less than 0.047 can be classified as unsuccessful (Node 5).d) Companies with an L1 variable ranging from -0.556 to -0.428, and an SY3 variable greater than 0.047 cannot be categorized as either successful or unsuccessful using the model (Node 6).e) Companies with an L1 variable greater than -0.428, a KSE5 variable equal to or less than -0.698, and an L10 variable equal to or less than -0.970 can be categorized as unsuccessful (Node 9).f) Companies with an L1 variable greater than -0.428, a KSE5 variable equal to or less than -0.698, and an L10 variable ranging from -0.970 to -0.492 are thought to have a high probability of failure (Node 10).g) Companies with an L1 variable greater than -0.428, a KSE5 variable equal to or less than -0.698, an L10 variable greater than -0.492 are thought to have a high probability of success (Node 11).h) Companies with an L1 variable greater than -0.428, a KSE5 variable greater than -0.698, and an SY2 variable equal to or less than 0.315 are thought to have a high probability of success (Node 13).The program continued classifying using variables BK5, KSE2 and L2.Although a group of 100% successful companies is obtained at the end of the classification process, it is thought that next part of decision tree has no important benefit to the model.i) Companies with an L1 variable greater than -0.428, a KSE5 variable greater than -0.698 and an SY2 variable ranging from 0.315 to 0.467 can be largely considered successful.However, it should be kept in mind that a high margin of error exists (Node 20).j) Companies with an L1 variable greater than -0.428, a KSE5 variable greater than -0.698 and an SY2 variable greater than 0.467 can be considered unsuccessful (Node 21).k) Companies are considered successful if they possess an L1 variable greater than -0.428, a KSE5 variable greater than -0.698, an SY2 variable equal to or less than 0.315, a BK5 variable greater than -0.587, a KSE2 variable greater than -0.113, and an L2 variable greater than -0.462 (Node 19).
As with the C5.0 algorithm, the analysis uses 306 observations that constitute 24% of the entire data set.The remaining observations belong to successful companies, and they are excluded from the analysis in order to maintain balance.Of the 306 observations, 266 are chosen to form a training set (49% unsuccessful and 51% successful).The remaining observations comprise a testing set.The C5.0 algorithm model is formed using seven variables, while the CHAID model is formed using nine.The following variables are used both in two models: SY2 (leverage ratio (total liabilities / total assets)), L1 (current ratio (current assets / short=term liabilities)), BK5 (cash flows from operating activities / total liabilities), and KSE5 (EBITDA I / total assets).
The following section summarizes the results of the analysis of the C5.0 and CHAID decision tree algorithms.Firstly, as expected, both models classify firms based on the fundamental ratios related to leverage, liquidity, profitability and cash flow.Secondly, the models developed based on C5 and CHAID decision tree algorithms classify both successful and unsuccessful firms with acceptable rates of accuracy.

Conclusion
Companies tend to operate as if they possess immortality.However, a considerable number of companies have to terminate operations for various reasons.The financial failure of a company affects numerous market participants such as investors, shareholders, workers, creditors, clients and authorizing bodies.Therefore, the failure of a company has an effect on the economy.Globalization and the increase in the number of new, more complex financial instruments compound the effects of failure.Within this context, predicting financial failure attracts the interest of researchers.
When predicting financial failure, studies make use of statistical and mathematical methods along with a theoretical approach.Artificial intelligence at has also been used in recent years.
This study uses C5.0 and CHAID decision tree algorithms, which have frequently been utilized for classification and prediction studies in recent years.
The study tests the status of companies as successful or unsuccessful based on financial capability using more than one defining variable.Thus, our dependent variables consist of two groups.
The study uses the annual financial statements and notes of manufacturing companies listed in the Borsa Istanbul Equity Market during the period from 2007 to 2013.These documents are prepared according to IFRS.The 206 manufacturing companies traded on the Borsa Istanbul Equity Market have been selected for the sample provided they were listed for either all of part of the examination period.Publicly disclosed financial statements and their notes (balance sheets, income statements and cash flow statements) are collected, and 35 financial ratios in five categories have been calculated.These ratios comprise the independent variable of the study.They consist of 1,149 observations (88.5%) classified as successful and 149 observations (11.5%) classified as unsuccessful.
Rather than use all of the observations (1,298 in total with 149 unsuccessful and 1,149 successful), the study uses 306 observations (157 successful and 149 unsuccessful) comprising roughly 24% of the total data.In order to balance our sample, the remaining observations belonging to successful companies are excluded.The model is composed of 266 observations with 49% belonging to unsuccessful companies and 51% belonging to successful companies.The remaining observations are used for test purposes.
Evaluating the model developed using the C5.0 decision tree algorithm, it can be seen that a company's status as successful or unsuccessful can be determined by values received from the following ratios: profit before tax / net sales, leverage ratio, working capital turnover rate, equity structure, current ratio, cash flow from operations / total liabilities and earnings, and EBITDA I / total assets.
The model formed by algorithm C5.0 is applied to whole date set (1,298 observations), producing a general accurate classification rate of 85.13%.The rates of accuracy are 92.62% for successful companies and 84.16% for unsuccessful companies.
Evaluating the model developed using the CHAID decision tree algorithm, it can be seen that a company's status as successful or unsuccessful can be determined by values received from the following ratios: current ratio, interest coverage, tangible fixed assets (net) / total equity, EBITDA I / total assets, assets turnover rate, leverage ratio, cash flows from operations / total liabilities, profit after tax / total equity, and liquidity ratio.
The model formed by algorithm CHAID is applied to whole date set (1,298 observations), producing a general accurate classification rate of 87.37%.The rates of accuracy are 74.5% for successful companies and 89.03% for unsuccessful companies.
Comparing the results of the models formed by the C5.0 and CHAID algorithms, it can be seen that accurate classification rates are at acceptable levels for both models.Although the CHAID algorithm's general rate of accuracy and its rate for successful companies are greater than the rates obtained from the C5.0 algorithm for the same observations, the model formed by the C5.0 algorithm can be considered more successful than that of the CHAID algorithm.This is because the accurate classification rate of the model formed the C5.0 algorithm is far greater than the results obtained for unsuccessful companies from the model formed by the CHAID algorithm.
The C5.0 decision tree algorithm predicts company success and failure with an accuracy rate of 85%-93% while the CHAID algorithm does so with an accuracy rate of 75%-89%.
It can be concluded that both models classify firms as expected based on fundamental ratios related to leverage, liquidity, profitability and cash flows.Furthermore, the models developed based on decision tree C5.0 and CHAID algorithms can be used for classifying both successful and unsuccessful firms at acceptable levels.We also propose that these models be used for credit rating studies and/or practices and the scoring of firms.
the market value of company shares greater than the change in the index in which the shares are traded over a specific period -Decrease in the market value of the company shares greater than the return of the other shares in the same market over a specific period -Decrease in the net written-down value of a share below its net book value Most studies use book-value-based indicators, and companies are classified using one or more of these indicators.

Figure 4
Figure 4 displays the success graphs obtained from the training and testing sets.The uppermost line shows the

Table 1 .
Table 1 outlines the various types of indicators.Indicators used to determine financial failure

Table 2 .
Summary of studies on determination of financial failure and methods used

Table 3 .
Summary of studies on predicting financial failure in Turkey and their methods Ekşi (2011)total assets, Cash flows from operations / total liabilities, Total liabilities / total assets and Net sales / short-term liabilities can be used to predict financial failure Note.The table has been expanded using information from the work ofEkşi (2011).

Table 4 .
Financial ratios used in this study and their definitions

Table 5 .
Descriptive statistics regarding variables

Table 6 .
Company profiles obtained from the schema formed by C5.0 decision tree algorithm

Table 7 .
General accurate classification rates of training and testing sets formed by the C5.0 algorithm Table8displays the rates of accurate classification obtained from the C5.0 algorithm within the total data set.The general rate of accurate classification is 85.13%, and the accurate classification rate is 92.62% for successful companies and 84.16% for unsuccessful companies.

Table 8 .
Accurate classification rates obtained from the schema formed by the C5.0 algorithm

Table 9 .
Company profiles obtained from the schema formed by the CHAID decision tree algorithm