Designing Optimal Knowledge Base for Neural Expert Systems

One of the limitations of conventional expert systems and traditional machine induction methods in capturing human expertise is in their requirement of a large pool of structured samples from a multi-criteria decision problem domain. Then the experts may have difficulty in expressing explicitly the rules on how each decision was reached. To overcome these shortcomings, this paper reports on the design of an optimal knowledge base for machine induction with the integration of Artificial Neural Network (ANN) and Expert Systems (ES). In this framework, an orthogonal plan is used to define an optimal set of examples to be taken. Then holistic judgments of experts on these examples will provide a training set for an ANN to serve as an initial knowledge base for the integrated system. Any counter-examples in generalization over new cases will be added to the training set to retrain the network to enlarge its knowledge base.


Introduction
Since expertise is scarce and not always available in some domain, Expert Systems (ES) seek to transfer the knowledge on problem solving from human experts to computer-based systems and then on to other human non-experts.production rule is required to be explained in a detailed manner.However, literature on cognitive psychology points out that most experts cannot express explicitly how the decisions were reached (Nisbett & Wilson, 1977;Ericsson & Simon, 1980).Therefore, the knowledge acquisition for conventional ES has to deal with seemingly incomplete and/or inconsistent information provided by the experts.Then a machine induction technique has been used to deduce new knowledge from exemplary solutions from problem domain provided by the experts.However, most machine induction techniques are hindered by the inadequacy of traditional statistical methods and/or the requirement of a large sample set to represent a problem domain.
To overcome these limitations, this paper reports on the design of an optimal knowledge base for an integrated system of Artificial Neural Network (ANN) and ES.In this framework, an orthogonal plan is used to define an optimal set of examples to be taken from a problem domain.Then holistic judgments of experts on these examples will provide a training set for an ANN.With its ability in pattern recognition and function approximation, the ANN will learn the decision patterns and production rules in the sample set to build the system's initial knowledge base and then generalize this rule base to new cases.Any counter-examples in generalization will be added to the training set to retrain the network in order to enlarge its knowledge base.The knowledge base of the integrated system will grow over time as the system learns new patterns in the same way human beings broaden their knowledge.
The paper is organized as follows.In the next section, limitations of conventional ES and machine induction technique are reviewed.Then the ability of ANN as a machine induction technique is discussed.After that, the design of an initial knowledge base for an integrated system of ANN and ES is presented.It is followed by an illustration on the capacity of such initial knowledge base in an integrated system.The paper concludes its findings and directions for future research.

Limitations of Conventional Expert Systems and Machine Induction of Expertise
The goal of knowledge acquisition for ES is to obtain necessary information to represent the knowledge by production rules in the form of cause/effect, situation/action, if-then-else (Chorafas, 1990).But discovering the expert's mental models could be very difficult.Verbal protocols may be informative but incomplete.The technique may even yield erroneous information since people may state their beliefs and actually believe in one thing, but frequently act in quite a different manner.Furthermore, it has been noted that personal belief structures are not available for inspection (Norman, 1983).Anderson (1983) proposes that the skill acquisition of human experts goes through three stages.In the declarative stage, new information is acquired in a declarative form containing domain-independent facts.Next comes the knowledge compilation stage in which the declarative knowledge is transformed into a domain-specific procedural form by recording the conditions under which a piece of declarative knowledge fits into production rules.The final stage is the procedural stage in which procedures become more automated.As the declarative knowledge becomes procedural, the ability to verbalize knowledge is lost.The cognitive changes in acquisition of knowledge reduce the ability to report on cognitive processes and increase the inaccessibility of proceduralized task knowledge.
Due to the automatization of cognitive skills and the proceduralization of domain knowledge, there is evidence that people generally have little or no introspective access to higher order cognitive processes.People are rarely able to give retrospective reports on what they were thinking about in solving a problem (Nisbett & Wilson, 1977;Ericsson & Simon, 1980).In fact, there is indication that the more people have about task knowledge the less ability they have to verbalize that knowledge (Berry & Broadbent, 1984).
Numerous literatures have reported on the limitations of human cognition (Best, 1998).Human memory is prone to forgetting and distortion of stored information (Baddeley, 1976).The rationality of human thinking in the formal logical sense can be questioned and some aspects of deductive reasoning prove to be problematic for human beings (Johnson-Laird & Wason, 1977).People have difficulty in handling combinations of uncertain evidence in multi-dimensional decision problems (Kahneman, Slovic & Tversky, 1982).
Within these cognitive limitations of human beings, one faces difficulties in building conventional ES since these systems require structured rule sets, but experts cannot always express explicitly the rules and procedures used in arriving at a solution.As the domain gets larger and more complex, the experts become unable to explain how they operate (Giarratano & Riley,2004)).However, they can supply suitable examples of problems and solutions reflecting their conceptualization of a domain.Hidden behind the holistic assessment of problem alternatives, there always exists implicitly a logic on the association between input-output patterns.The machine induction could be a suitable means to recognize and reveal these patterns.
It has been found that human beings in general do have the ability to make holistic judgments (Slovic & Lichtenstein, 1971), and expertise is based mostly on the recognition of patterns in the problem space (Smith, Adams & Schorr, 1978;Chase & Ericsson, 1982).These characteristics make machine induction a more appropriate technique to discover the decision patterns from a set of holistic assessments provided by experts.
Machine induction offers the possibility of deducing new knowledge (Hassoun, 1995;Haykin, 2009).It can list all the factors that influence the decision, without understanding their impacts, and induce a rule that works successfully (Turban, Sharda & Delen, 2011).The method needs only pre-classified examples and consideration of all samples in the domain.Apparently, one of its major disadvantages is in the requirement of a database containing sufficiently documented cases structured around human knowledge on a problem domain.In addition, the induced rules can be too large or too complex leading to unintelligibility.
Machine induction method has used various algorithms to convert knowledge on related attributes, values and relationships in problem solving, into rules.Such algorithms vary from statistical methods (e.g., decision tree, discriminant analysis, step-wise regression, principal component analysis, factor analysis...) to neural computing (Flores, 2011).Most machine induction techniques rely on traditional statistical methods.It has been pointed out that these traditional methods cannot handle the nonlinear relationship without imposing strong assumptions on behavior of the data.

Machine Induction of Expertise with Integrated ANN and ES
To overcome the inadequacy of traditional machine techniques in handling the nonlinear relationship and to avoid imposing a priori restrictions on the data, Artificial Neural Networks -ANN could be an appropriate technique in machine learning and ES building.This technique is particularly useful in recognizing the input-output patterns in heuristics, which cannot be expressed explicitly by the experts.An ANN does not require a priori elaborate models or pre-specified probability distribution function of the data in order to learn the underlying discriminant patterns and association.
An ANN contains processing/computing units called neurons (or nodes).These nodes are arranged into layers, in which a node in one layer has a weighted connection to each node of the next layer in a particular configuration.A node, as a processing unit, receives inputs from other nodes or from an external stimulus.A weighted sum of these inputs constitutes the argument to an activation or transfer function.
Most applications have used 3-layer networks consisting of one input, one hidden and one output layer.The hidden nodes are needed to introduce nonlinearity into the network.In some cases, more hidden layers are necessary to approximate a higher order function.An input node provides an external signal to the network.An output node produces an output of the network as a whole.A hidden node that is necessary for the computation of complex functions.Node inputs and activations can be discrete, taking on values {0, 1} or {-1, 0, 1}, or be continuous, taking on values in the interval [0,1] or [-1,1].Each node u i computes a single numerical node output or activation.Output of a node can be the output of the network as a whole and/or it can be the input to other nodes.Every node, other than input nodes, computes its new activation u i as a function of the weighted sum of inputs directed to it from other nodes: where the activation function, f(.), is usually a nonlinear, bounded and piecewise differentiable function such as the sigmoid function, Such an ANN produces a response, which is the superposition of n sigmoid functions, where n is the number of hidden nodes, to map a complex function.As one adds more hidden layers, ANN will be able to map higher order functions.Therefore, the function mapping with ANN is more general than the regression of traditional methods.
In theory, an ANN can learn from past data and generalize over new cases.With its ability as a universal function approximator, an ANN can discover the associations between groups, elements of the domain space and those of the problem space.It has been shown that an ANN can approximate a functional relationship using a sample of input-output patterns, and learn probabilities and statistical distributions from the data (Cybenko, 1 989;Hornik, Stinchcombe & White, 1989).
In practice, an ANN can offer the advantage of computer execution speed.The ability to learn from cases and train the system with data rather than to write programs may be more cost effective and even more convenient when frequent updates are needed (Medsker, 1995).In applications where rules are unknown, an ANN may be able to represent those rules implicitly in its stored connection weights.
However, previous integration of ANN and ES (Medsker, 1995) is hindered by the availability of examples and the cognitive effort of experts in providing judgment on these examples.The cognitive limitation of human experts made the assessment of a large sample impractical if not erroneous.Most importantly, these systems did not consider the trade-off in multi-criteria decision making which transcends all business problems.The representation of trade-offs in an appropriate rule set for conventional ES is extremely difficult.Taking into account the tradeoffs in their assessment, experts might provide judgments that seem to be contradictory and/or inconsistent with the formal logic.These limitations could be overcome by building an optimal training set for the ANN containing holistic assessments of experts on a set of examples defined by an orthogonal plan.Knowledge acquired from this optimal training set will serve as the initial knowledge base for an integrated system of ANN and ES.

Orthogonal Plan as Initial Knowledge Base for Integrated ANN and ES
To alleviate the burden of information processing on human experts in knowledge engineering, one can implement an orthogonal plan to define the minimum size of sample to be assessed.In the experimental design and analysis of variance, an orthogonal main-effect plan (Addelman, 1962) permits the study of several factor effects without going into every combination of factor levels.A factorial experiment is called symmetrical when each of its factors has the same number of levels.The experiment with factors having different number of levels is called asymmetrical factorial.
Consider a symmetrical factorial experiment involving (s n -1)/(s -1) factors, each of which has s levels with s n treatment combinations, where s is a prime or a power of a prime number.These (s n -1)/(s -1) factors can be presented by n factors, each having s levels and their generalized interactions.Therefore, the treatment combinations of a main-effect plan for (s n -1)/(s -1) factors in s n trials may be obtained by choosing the treatment combinations of a complete s n factorial plan and generating the remaining factors by the interactions of the s n experiment (Addelman, 1962).For an asymmetrical factorial experiment, the orthogonal main-effect plan can be constructed by collapsing factors occurring at s 1 levels to factors occurring at s i levels by using a many-to-one correspondence of the set of s 1 levels to the set of s i levels.
It has been estimated that, by using an orthogonal design, a set of 49 holistic assessments of production rules can cover the dimensionality of decision problems having up to eight criteria, with seven levels per criterion (Barron & Person, 1979).For example, in a four-criterion problem with five levels per criterion, instead of making 4 5 or 1024 assessments to define a set of production rules for a conventional ES, with an orthogo nal plan one needs only 25 assessments to capture the main effects of problem factors.Orthogonal plans not only help to reduce the information processing burden on the experts, but also serve to define initial knowledge bases for integrated systems of ANN and ES.
This paper investigates the implementation of an optimal knowledge base designed by an orthogonal plan in an integrated system of ANN and ES.Such integration will help to overcome the difficulties often encountered by the conventional ES technology.Within the proposed framework, one starts with an appropriate orthogonal plan to define a set of basic examples from the problem domain.The holistic judgments of an expert on these basic examples constitute a training set for the ANN.The holistic assessments intend to overcome difficulty of the expert in explaining explicitly production rules of his/her heuristics.Once the decision patterns of this set are learned, the network acquires an initial knowledge base on the problem domain.
After being trained, the network will be used to generalize over new cases.The expert will then be asked to evaluate the network performance on these cases.Any wrong prediction will constitute a counter-example to be added to the training set.Consequently, the network will be retrained to learn new decision patterns emerging from the addition of counter-examples to the training set.One notes that the integrated system can be put in production, even if it has learned only partial information on the problem domain.Since the system is trained with basic patterns of problem domain, it can generalize well over new cases.The process of "learning by doing" continue as needed to acquire a robust knowledge base for the integrated system.Over time, the system will enrich its knowledge base with additional decision patterns as it learns from its own production and the expert's opinion.This knowledge acquisition process is intuitive since it is similar to the one used by human beings.The knowledge base of the integrated system will grow over time as the system learns new patterns in the same way human beings broaden their knowledge.

An Illustration: Project Evaluation and Economic Appraisal
The following illustrates the implementation of an optimal knowledge base in an integrated system of ANN and ES.The problem relates to project evaluation and economic appraisal of proposals for new products in a manufacturing company.The proposal is evaluated on five criteria: (i) Net Present Value (NPV) of cash flows generated over the next five years, (ii) initial capital investment requirement, (iii) market growth rate in the next five years, (iv) capability to market the new product, and (v) prospect of technical success.Before submitting any proposal, the related department estimates the values of each project criterion.Some criteria are quantitative, others are qualitative.In the appraisal process, the quantitative data are converted to categories to avoid the difficulty in dealing with data on a continuous scale.For example, the flow of NPV is classified into five categories ranging from one to five million dollars.Similarly, the amount of initial investment is classified into five categories ranging from half a million to two and half million dollars.In the appraisal, there exist trade-off among criteria and no single criterion absolutely dominates any others.The experts will rate each proposal on a scale ranging from 0 to 100.In this study, the appraisal was taken by a senior manager, a senior engineer and a chartered accountant indicated as Expert A, Expert B, and Expert C respectively.In this manner, the sample constitutes a set of production rules implicitly expressed by the expert.Using this set, one trains an ANN to acquire an initial knowledge base for the integrated system.To test the performance of the system, one uses the acquired knowledge base to generalize over two test sets.The expert is asked to evaluate the network prediction of his/her preference on the first test set.If there is any disagreement between a network prediction and an expert judgment, this counter-example will be added to initial training set.Then the network is retrained to learn the new decision pattern and make generalization on the second set.The knowledge base of the system will grow over time with the learning of new patterns emerging from the counter-examples.
In this study, an ANN is configured with five input nodes, one for each criterion, five hidden nodes and one output node.This network uses a backpropagation algorithm with sigmoid transfer function, a learning rate of 1 and a momentum of .9.The network training for this case takes longer time to learn the decision patterns and production rules of Expert C.However, the system's knowledge base and predictability for preference of Expert C will increase over time as it learns more about the decision patterns of this expert.

Conclusion
The limitation of conventional ES and traditional machine induction techniques in knowledge engineering is in their requirement of explicit production rules from an expert on a large structured sample taken from a problem domain.This study has shown that an integrated system of ANN and ES can perform well with an initial knowledge base designed with the orthogonal plan to capture expertise in a problem domain.This optimal NPV of Cash Flow $[1.0 2.0 3.0 4.0 5.0] millions Initial Investment $[2.5 2.0 1.5 1.0 0 sample contains only a subset of all possible production rules in a problem domain.Consequently, much less cognitive effort is required from the experts since they make judgments only on a set of fewer examples.In addition, holistic assessment does help experts to overcome the difficulty in expressing explicitly the production rules of their heuristics.Any counter-examples in generalization will be added to retrain the network in order to enlarge its knowledge base.This approach is particularly attractive, as the knowledge base of the integrated system will grow over time as the system learns new patterns in the same way human beings broaden their knowledge.
To acquire a robust knowledge base for the integrated ANN-ES system, one may consider aggregating judgments of multiple experts in a composite training set or in a composite individual ANNs.The result from this aggregation may provide a starting point for negotiation in a group decision support system.In a future work, we shall report on the implementation of such system and show that once the system aggregates group preference from individual judgments, it will provide a consensus solution for the group in a decision problem.

3-
Train the network to learn the decision patterns from the initial knowledge base.4-Generalize the trained network over new examples.5-On the opinion of the expert, add counter-examples to the training set.6-Retrain the network to acquire a new knowledge base.

Table 1 .
Knowledge Acquisition Process of the Integrated ANN and ES 1-Select an appropriate orthogonal plan to define the optimal training set.
2-Acquire holistic judgments of expert on the training examples to build the initial knowledge base.

Table 2 .
Project Evaluation and Economic Appraisal: The Problem DomainFor this problem domain, a complete set of 864 production rules would be needed to build a knowledge base for a conventional ES and make such ES functional.To alleviate this cognitive burden in this study, an orthogonal plan was used to define a sample of only 24 examples of production rules from the problem domain to be assessed.To validate the performance of the integrated system, two test sets, each containing five out-of-sample examples, are set up.Each expert was asked to provide his/her overall preference judgment for each example of the training set and test sets.For instance, the preference of Expert A for the first training example (1 2 2 3 3 50) in Appendix 1 is interpreted in the following production rule.
The training and testing tolerance are set at .1 and .3respectively.For Expert A, the network converged after 119 training epochs with a Root Mean Square of Errors (RMSE) of .0304.In generalization over the two test sets, it made RMSE of .0170and.02275respectively.For Expert B, the network converged after 104 training epochs with an RMSE of .0333.In generalization over the two test sets, it made RMSE of .0381and.0473respectively.In these cases, the experts judged that the system had provided satisfactory generalization on both test sets.The case of Expert C is illustrated in Appendix 2. In this case, the network converged after 2425 training epochs, with an RMSE of .0334.On the first test set, it made two wrong predictions with an estimate of 81.70 for the example having a preference score of 70 and an estimate of 79.42 for the example having a preference score of 65.These two counter-examples were added to the training set and the network was retrained to learn new patterns in these examples.In the second run, the network converged at 1769 epochs with an RMSE of .0369.On the second test set, it made only one wrong prediction with an estimate of 93.34 for the example having a preference score of 60.