Survival Analysis Employed in Predicting Corporate Failure: A Forecasting Model Proposal

In face of the current economic and financial environment, predicting corporate bankruptcy is arguably a phenomenon of increasing interest to investors, creditors, borrowing firms, and governments alike. Within the strand of literature focused on bankruptcy forecasting we can find diverse types of research employing a wide variety of techniques, but only a few researchers have used survival analysis for the examination of this issue. We propose a model for the prediction of corporate bankruptcy based on survival analysis, a technique which stands on its own merits. In this research, the hazard rate is the probability of ‘‘bankruptcy’’ as of time t, conditional upon having survived until time t. Many hazard models are applied in a context where the running of time naturally affects the hazard rate. The model employed in this paper uses the time of survival or the hazard risk as dependent variable, considering the unsuccessful companies as censured observations.


Introduction
The problem of corporate bankruptcy has been, and will surely remain a topic of particular interest to a broad set of economic agents.The corporate bankruptcy-economic, financial or legal-can result from a diverse set of complex causes, both of internal and external nature, that can be attributed, for example to a weak organizational structure, the company's own strategy, technological changes, or to changing economic conditions.
The development of predictive models for corporate bankruptcy is a strand of research that was driven by the seminal work of Beaver (1966) and Altman (1968), and there are an increasing number of researchers who are interested in this subject.
In the past four decades many studies have been published, with methodological refinements that were not always accompanied by an improvement in the results obtained.Perhaps that is the reason why in recent years the researchers have searched for alternative techniques and tools in order to develop models with greater usefulness and accuracy.
Overall, it seems arguable that the models developed for corporate bankruptcy prediction can be as much as useful as helpful for decision making.Therefore, in this paper we propose a model for corporate bankruptcy prediction based on survival analysis.Despite being an unusually employed technique in this type of study, it is believed that its possibilities seem to be have been still little explored, and therefore we believe this paper can offer a significant contribute to the existing research in the bankruptcy prediction field.

The Cox Proportional Hazards Model
A brief rationale of the Cox proportional hazards model follows, preceded by the description of the survival and hazard functions.According to Collet (1994), the current survival time of an individual t can be regarded as the realization of a random variable T, which may assume any given non-negative value.Therefore, T indicates the time to failure of a firm.T is thus associated with survival time and follows a given probability distribution.Being T a continuous probability distribution and assuming f as the underlying probability density function, the function of distribution is then given by (1) which represents the probability of the survival time being inferior to a given value of t.
The survivor function S(t), is defined as the probability that a firm will survive longer than t times units, being equal or higher than t, and assumes the following notation:

-
(2) The survival function may therefore represent the probability of the survival time of an individual to exceed a given value of t.
The hazard function describes the evolution over time of the immediate rate of "death" of a firm.To obtain the hazard function, we assume the probability that the random variable associated with a survival time T is in between t and t+δt subject to a T value greater than or equal to t, which can be shown as The hazard function h(t) is then the limit of that probability divided by the interval of time δt, with δt tending to zero as we can verify below: The hazard is the probability of failure in the next instant, given that the firm was alive at time t (Lane et al., 1986).
According to Collet (1994), from this point some useful relationships can be obtained between the survival and the hazard functions.
Considering the Bayes' theorem, the probability of a given event A, subject to the probability of a given event B, is Based on this result, the conditional probability of the hazard function in equation ( 4) is: (5) which is equal to the definition of the derivative of F , at the moment t , given by f , and therefore (9) The survival function, ) (t S , can be obtained from the following equation:  There are two main reasons for modelling survival data.One is to determine which combination of potential explanatory variables affects the shape of the hazard function.Another one is to obtain an estimate of the hazard function for a particular company. One model that we could apply is the proportional hazard model proposed by Cox (1972), which is also known as Cox regression model.
The definition of the model can be made as follows.Assuming that the hazard of "failure" for a given time period depends on the values x 1 , x 2 , …, x p of p explanatory variables X 1 , X 2 , … X p , the set of values of explanatory variables in proportional hazard model will be represented by the vector x, so x = (x 1 , x 2 , …, x p ).
We designate h 0 (t) as the hazard function of a company for which the values of all variables that make the vector x is zero.The function h 0 (t) is called baseline hazard function.The hazard function for i companies can then be written as: where ψ(x i ) is the function of the values of the vector of explanatory variables for i companies.
The function ψ(x i ) can be interpreted as the risk over time t for a company whose vector of explanatory variables is x i on the risk for a company whose x=0.
Since the relative risk ψ(x i ) can not be negative it should be written as exp(η i ), where η i is a linear combination of p explanatory variables in x i .Therefore, which is equivalent to (15 where β is the vector of coefficients of the x 1 , x 2 , …, x p explanatory variables in the model.
The quantity η i is called the linear component of the model, also known as risk score or prognostic index for i firms.The proportional hazard model can generally be expressed as follows:

Survival Analysis in Predicting Business Failure
From the review literature made, one can conclude that the number of studies using this type of analysis in the field of business failure prediction is still very much reduced.Nevertheless, there are a number of significant contributions in the literature, namely from Lane et al. (1986), Luoma and Laitinen (1991), Chen and Lee (1993), Audretsch and Mahmood (1995), Laitinen (1999), Laitinen and Kankaanpää (1999), Partington et al. (2001), Shumway (2001) and Parker et al. (2002).A brief reference to the first two follows, as they are the most cited works employing survival analysis in this research field.Lane et al. (1986) were the first to employ the Cox model to predict bank failure, using a sample of 130 banks that failed between January 1978 and June 1984, and another sample 334 non-failed banks.The survival time for each failed bank has been defined as the time (in months) since December 31st, of the year considered for the calculation of financial ratios, to the date of bankruptcy.For banks that did not failed, censored survival time was defined as the time (in months) since December 31st, of the year considered for the calculation of financial ratios, until the 31st December of the year of a paired failed bank.The classification procedure proposed by the authors was based in the computation of the probability of a bank to survive more than t months, based on the values of financial ratios of the respective bank and t = 12 for the model based on data from a prior year; and t = 24 for the model based on data from two years prior to failure.
Lane et al.'s study results indicated that the overall accuracy of the Cox model was similar to the one obtained by using the discriminant analysis, being, however, the type I error lower in the Cox model.Luoma and Laitinen (1991) have also applied the survival analysis in predicting business failure.These authors used a sample of 36 failed companies (24 from industrials and 12 retailing firms) each paired with a not failed company belonging to the same business and of similar size.The results were compared with models developed from discriminant analysis and logistic regression.The percentage of correct classifications was 61.8%, 70.6%, and 72.1%, for survival analysis, discriminant analysis and logistic regression, respectively.The authors explained the lower accuracy of the model based on survival analysis with the different failure processes found in the data.

The Proposal of a Predictive Model
Taking into consideration the models offered in the literature, but also by employing a specific set of variables that we find appropriate to test using a survival function, follows here our proposal of a predictive model of corporate failure.

The Variables Used
In this paper, several economic and financial indicators were used to construct a set of independent variables.Similarly to the procedure used in diverse studies devoted to predicting business failure, the selection of the independent variables was based on its popularity, measured by its use in previous studies.
The Table 1 lists the 28 selected indicators that were collected from the balance sheet and from the income statement of the companies included in the sample.

The Companies' Sample
In order to adjust the model, it was necessary to obtain a sample of companies where the event of interest occurred, this is, where there was a closure of activity.
Based on the information provided by insolvency administrators it was possible to obtain a sample of 11 companies, whose survival times were known and that are classified as belonging to the group of failed companies (Note 1).
Concurrently, we obtained a sample of 16 companies that did not fail, i.e., with survival times censored.
All companies belong to the textile industry and the information needed was collected from the balance sheets and income statements of three consecutive financial years, comprising the time periods from 2003 to 2007.
Taking into consideration the survival times, it was possible to split each company into 3 sets of observations, which resulted in a group of failed companies with 33 observations, and a group of companies that did not fail with 48 observations (Note 2).To illustrate this situation, one can consider the data from a company that was active until six months after the latest year for which we have data records.Since we collected data for 3 consecutive years, it is possible to have data for 6, 18 and 30 months prior to the time of business closure.This procedure was repeated for all 27 companies in the sample.
The selection method of the explanatory variables followed Collett (1994) procedure, and the testing was performed using SPSS software, version 20.0.
The explanatory variables that contributed significantly to the reduction of statistics 2 , are shown below, in Table 2.  )] exhibits the risk, relatively to a basic function, for each variation of one unit of the respective explanatory variable.
In Table 3 we can observe the survival time and the model's values for the explanatory variables for each of the companies that compose the testing sample.In the last column is shown the risk of each company, which was calculated using the equation 14.We can observe that the model shows the largest discrepancies between risk and the survival time mostly on companies 2, 12, 22, 27 and 33.However, this conclusion should be complemented with an analysis of the survival function for each of the respective companies, as examined later.Overall, the model exhibits good results for the remaining companies and displays conformity between risk and survival time.
The verification sample was composed by all those observations that were not considered to fit the model, consisting of 13 observations where the event of interest took place and 24 censored observations, as we can observe in Table 4.By observing the last column which, as mentioned before, indicates the risk of each company, we can verify that the model misclassified the observation 58 by attributing it an almost zero risk, when it is known that it would became out of business a year and a half later.In addition to the company 58, we can see that the model shows the largest discrepancies between risk and their survival times primarily in companies 55, 57, 68 and 73.Apart these companies, we can conclude that, in general, the model shows a significant correlation between relative risk and survival time for the observations of the verification sample.

Survival Algorithm
In Table 5 are shown the values of the survival function relatively to the average of the variables' values.
Together with the algorithm 1, shown in appendices, it is possible to calculate the values of the survival function for a given company at each moment of time available, providing an image preview of its behaviour over that period.The operation of the algorithm used in this paper is very simple.After starting the program Matlab, it is only needed to enter the function name and to input the value of the indicators as prompted.To get an idea of the outcome of the algorithm, we use two companies as illustration.The Company 3, whose data is for the six months before the close of the business activity, and Company 42 with censored data relating to 36 months before the end of the study.
Algorithm 1. Values of the survival function function surviv te= [4,5,6,8,10,16,17,18,20,22,28,29,30,32,34]; b1=1.805; b2=-1.867; b6=-7.156; H0=[0.018,0.061,0.236,0.421,0.622,1.133,1.428,2.087,2.493, 2.961,4.014,5.233,8.549,13.361,16.959The output of algorithm 1 begins by showing us the value of the survival function for each of the periods referred to (in months), and it appears in the first column of the survival tables, more precisely in Table 6 for Company 3, and in Table 7 for Company 42.The information is complemented with the corresponding graphic of the survival function, in Figures 1 and 2, for companies 3 and 42, respectively.As shown by the values of the survival function for company 3, the probability to survive two more months than the effective period of 6 months is only 22.5%, a condition that is easily noticeable in the respective plot, shown in Figure 1, which signals a steep drop after 5 months.
Concerning company 42, the probability of surviving beyond 4 months is about 99.99% and the probability of being in business for 34 months is almost equal to the percentage mentioned before, with a small decrease of about 2% only.
In order to calculate the errors' percentages and the accuracy of the classification of the companies' samples used in the estimation and validation, it was used a cut-off point of 0.5, i.e., we consider a correct prediction when the likelihood of a company to survive is greater than 0.5, at least as much as the time period associated to it.
Of the 65 observations sample, the estimation model misclassified 9 observations.In 4 of which we verified that the actual survival time was less than expected (type I error), and in 5 cases the actual survival time was higher than expected (type II error).Based on these results the type I error was 6.15% and type II error was 7.69%.
When compared with the validation sample, which was comprised by 55 observations, the model exhibited a similar behaviour, and the actual survival time of 4 cases was lower than expected (7.27%).Conversely, in 7 cases the actual survival time was higher than expected (12.73%).

Conclusions
The model developed in this paper employs survival time, or the hazard rate, as the dependent variable and assumes that failed and not failed companies come from the same population, considering the second ones as censored observations.
The main advantage of the model used relies on the additional information it provides.With this approach we get a different perspective, since the survival curve of analysis of a particular company allow us knowing the likelihood of a company survival beyond a given time period and hence the risk of falling into bankruptcy.
However, similarly to what happens with other methods, the accuracy of the model developed in this paper depends utterly on the quality of the data which supports the basis for its modelling.
This model relies on the proportionality of risks, which in reality may not be always the case.Another relevant limitation is the difficulty of obtaining the survival times, i.e., the time when the phenomenon that is being analysed occurs.
Based on the results obtained from the sample used, it seems to us that this method offers good perspectives when used for the development of forecasting models in the bankruptcy research field.We are convinced that using a more significant sample of firms, including audited accounts and also incorporating qualitative variables, it may be possible to develop a model with a higher predictive power, which may be of great usefulness for decision-making.
cumulative hazard function and may be easily obtained from the equation (10).

Table 1 .
Independent variables

Table 2 .
Variables in the equation While the variable X1 represents (Current assets-Current liabilities) / Total liabilities, the variable X2 refers to Current assets / Current liabilities, and the variable X6 represents Cash-flow / Current liabilities.The value of the last column [Exp(B

Table 3 .
Testing sample

Table 5 .
Survival function table