Anomaly Detection of Clinical Behavior Sequences

The identification of abnormal clinical behavior during the process of treatments is of great significance for regulating the standard medical behavior. Due to clinical behavior constrained by time, and the timing of subsequence, GSP algorithm was modified in the present paper, and described the timing of subsequence by the introduction of the concept of legal subsequences in order to detect the frequent patterns in sequences; sequence association rules in accordance with the characteristics of territorial behavior were screened using association rule methods in order to establish rule base; Comparing the similarity between the detected frequent patterns and normal behavior rules, anomaly detection of the detected behavior was operated and the validity of the methods was verified through experiments.


Introduction
Abnormal behavior in the process of clinical diagnosis was the medical behavior that was not in consistent with the diagnosis and treatment (Yang, 2004, PP. 601-603). The nonstandard treatment affected the normal treatment, but also caused damages to the patient's physical and mental health. Anomaly detection in clinical behaviors was primarily to seek unreasonable, or unusual clinical behavior in the process of clinical treatments, with attempts to regulate clinical diagnosis and treatment behavior and prevent the occurrence of illegal actives, and its critical point was to establish normal clinical behavior model, and compare and estimate the current clinical behavior using the model.
Association rule was mainly applied to analyze the links of the same record among different attributes, and one of the earliest technologies applied in the anomaly detection of data mining, such as data mining from the behaviors in the area of intrusion detection systems (Bruno, 2007) and credit card fraud detection (Sanchez, 2009, PP. 3630-3640) in order to find abnormity. Characteristics of such behaviors were that they were independent from one another and occurred in order, and also belonged to Boolean data. In a literature (Kheliaf, 2007), continuous data was processed with association rule mining, and all these data was not continuous in order and failed to occur at the same time. The relationship among clinical behaviors not only contained independent parts, but also continuous ones. As for the intraoperative blood transfusion process, it was always operated with other behaviors together. Since clinical behavior occurring during the process of single disease clinical diagnosis and treatment in clinical path followed a certain sequence, and the sequence was relatively constant and stable. For example, clinical pathway for acute appendicitis surgery treatment included simple routine examination, application of preventive medication, anesthesia, surgery, pathological examination, post-operative check-up and so on. The relationship among these behaviors was enormously complicated with strong correlation, and therefore, association patters among behavioral sequences were observed using association rule methods.
Consequently, aimed at the characteristics of clinical behaviors, we attempted to develop an anomaly detection method of clinical behaviors based on association rule. We sought and found correlations among these clinical behavior sequences under the normal condition, obtained sequence association rules, and finally ascertained whether the behaviors were abnormal or not according to the similarity of rule set between the detected behavior and normal behavior. Figure 1 was the anomaly detection model of clinical behaviors, and this model based on behavior rules during the process of clinical diagnosis and treatment. In the studying stage of modeling, normal clinical data was required to collect, and therefore, the rule we learned was the one under normal conditions. During detection, if several behaviors didn't conform to these rules, the rules might be abnormal.

Analysis
Behaviors during the process of single disease clinical diagnosis and treatment based on clinical path were related to the diagnosis and treatment behaviors for this disease. Clinical behavior sequence association rule during the process of clinical diagnosis and treatment possessed the following form: The rules could be understood that after clinical behavior 1 2 , ,..., n p p p occurred during the process of treatment, the diagnosis and treatment behavior P was always conducted. If clinical diagnosis and treatment behavior conformed to this rule in certain process, it indicated that the diagnosis and treatment behavior might be normal, and otherwise it was abnormal.
In a literature (Wang, 2008), clinical sequent obtained from sequence generation algorithm was expressed as (1) Each event had a starting and closing time attributes as a constraint, and there were sequence and parallel relations among events.
(2) Behaviors were continuous. As the clinical behaviors were constrained by time, therefore, continuous behaviors with the starting and closing time imposed legal constraints on subsequences. For example, as for subsequence s 1 =<ABD}> of s = <A{B,C}{B,D}> , B and D were concurrent in S, namely et (B) <st (D), but s 1 represented the relationship et(B)<st(D) which was a group of sequence behaviors, and partly distorted in the real sense. Consequently, s 1 was a subsequence but not a legal one that conformed to practical clinical behavior characteristics. Whether a sequence was legal or not depended on the presence of intersection parts between two item sets, and the legal subsequence among item sets strictly complied with If a sequence was not a valid sequence, and thus association rules deducted from it were all invalid. For example, as for previous-described illegal subsequence s 1 =<ABD>, we could deduct rules R 1 :<AB> <D> and R 2 :<A> <BD> from it; they were all legal subsequences before or after R1, and thus R1 was a legal rule, but R2 was not. This was because that the resultant illegal items were assigned to the both sides of rule, which eliminated all factors producing illegal subsequences and guaranteed that sequences at the both sides of rule were legal subsequences. Accordingly, sequences that had merely one illegal part could produce legal rules.

Correlated definitions
According to the characteristics and form of clinical behavior sequence data, in order to describe the right association rules for clinical behavior sequences, the definitions were supposed as follows: Definition 1: given a sequence s, the number containing s affairs in data set was called the support of s sequence, and denoted as support (s).
Definition 2: , ' s s represented two sequences in the data set, if their support was larger than the given threshold, and then the confidence was defined as follows:  s . If the number of ' s supported by 1 s was larger than by 2 s , s was illegal, and vice versa.
A sequence after extension could result in the following four situations: the ① extension of legal sequence resulted in legal one; ② the extension of legal sequence resulted in illegal one; the ③ extension of illegal sequence resulted in illegal one; ④ the extension of illegal sequence resulted in legal one; Newly added items should be linked at the most end during the sequence extension process of GSP algorithm, and therefore, they only affected the last two item sets. We merely took the last two items into account. Suppose sequence 1 1 s=<e ... k k e e − > , new item i a , aimed at the above-mentioned four situations, we did as the following manners: (1) As for ① and ④, due to the consequence of legal sequences, they would not be considered.
(2) If a sequence should have been legal, namely that s was legal. If it was illegal after extension, newly added items should be responsible for that. Record the location.
(3) If a sequence should have been illegal, namely that s was illegal, As the presence of illegal subsequence could result in illegal rule, it should be marked. We previously analyzed that the generation of rule could eliminate one illegal part, and one illegal sequence could be likely to become legal after extension. As for over two illegal parts in one sequence, if extension, it wouldn't eliminate illegal part. Accordingly, for one item that caused over two illegal parts in a sequence, it was not added to frequent sequence set and deleted directly. Aimed at the previous discussions, we recorded two illegal locations respectively using two flag bits, and the location resulting in illegal items using positive number. 0 represented no illegal locations.
Frequent sequence generation algorithm was as follows: Input: data set, support threshold Output: frequent sequence set L (1) L 1 = {large 1-sequence} (2) For (k=2; Lk -1 ≠φ; k++) do begin (3) C k =GSP-generate (L k-1 ) // linked using GSP method, return after pruning (4) For each of candidates k l in k C (5) Calculate support, delete sequences that failed to conform to the support threshold, and estimate whether k l was legal support sequence or not.  The generation of sequence association rules As for association rules resulted from frequent sequences, the following points should be taken into account: (1) Time sequence of sequences on both sides of rules; (2) Legality of sequences on both sides of rules; (3) The generation of legal and illegal sequence rules We elucidated the above-mentioned three points using the case of the rule generation of sequence S = < (A, C) (B, C, D))>.
(1) For time sequence of sequences on both sides of rules, we should always guarantee that the closing time of antecedent behavior was earlier than the starting time of middleware and consequent behaviors.
For this point, we could take use of this strategy to generate association rules. Firstly, we moved the first item of sequence S to the antecedent of rules. After that, each generation moved the first item of rule consequent to the last item of rule antecedent. As items in item set were intercurrent and items in the same item set couldn't be classified into different time sequences, the permutations and combinations should be placed on both sides of rules. For example, sequence S could result in several rules as follows: R 1 :<{A, C}B> <{C,D}E>, R 2 :<{A,C}D> <{B,C}E> and so on.
(2) The legality of sequences on both sides of rules For example, as above-mentioned, the antecedents of rule R 1 and R 2 were all illegal subsequences of sequence S, and thus these two rules shouldn't be established and should be deleted. Rule R 3 :<{ A, C}> < {B, C, D} E> was legal.
(3) The generation of illegal sequence rules As long as the illegal items were assigned to both sides of rules, sequences on both sides would be legal. Firstly, we found the item set where illegal sequence existed, took the latter sequence of each item in item set as the back part of rules, and other sequences as the front part of rules. Meanwhile, we should consider the legality of sequences on both sides.

Anomaly detection
some degree. After obtaining the clinical behavior model by means of association rule mining method, anomaly detection was operated in the clinical behavior model using model comparison methods. The specific point was as follows: first establish behavior model set R1 under the normal model, and mine behavior model set R2 from the detected data through calculating the ( 1, 2) similarity R R between two behavior models (Wang, 2000), namely that similarity was applied to show the deviate degree of the current behavior from normal behavior in order to determine whether the detected data was abnormal or not. Similarity ranged from 0 to 1. Higher value showed that the larger matching degree between two compared behavior model sets. 1 denoted total matching, while 0 complete misfit.

Results and discussions
We attempted to check the recognition capacity of clinical behavior anomaly detection on abnormal behavior. Data set used in the experiment was a data sample of acute simple appendicitis from 2000 to 2006 sampled from a hospital. Aimed at the goal "clinical cure, treatment integrity", after the previous processing of data, 1200 different cases were sampled from each group at random, and were marked as D , normal D and abnormal D , respectively. Among them, D was training set, normal D (without abnormal data) and abnormal D (with abnormal data) were test sets.
Using CSAR-GSP algorithm to mine the rule set of training set D , the uniform confidence threshold was 80%, and listed in Table 1.
To compare the similarity of association rule between abnormal and normal behaviors, test sets normal D and abnormal D were processed with association rule mining using the same support threshold, and the corresponding rule sets normal R and abnormal R were obtained, respectively. The similarity of the two rule sets between the above-mentioned training set D or rule set R was calculated. In order to observe the effects of support on similarity, the similarity of rule sets at different support threshold was assayed, and the results were depicted in Figure 2. As seen from Figure 2, the mining rule sets from clinical behaviors under normal conditions were different from abnormal conditions, and meanwhile, the difference in similarity was associated with support threshold.
Experiments indicated that the application of data mining technology in the association rule mining of clinical behaviors could estimate whether clinical behaviors was abnormal or not by setting the similarity threshold of rule sets.

Conclusions
In the present paper, we elucidated the application of association rule mining in the anomaly detection of clinical behaviors. Due to the huge clinical database, the diagnosis and treatment process of each case was different, and the standardization level of clinical behaviors and detections was not high, which imposed great difficulties to detections. In the present paper, we selected partial data, and could detect clinical abnormal behaviors by means of association rule mining method. Certainly, as an anomaly detection method, it was based on limited statistical data and expert subjective experience, and still had necessary room for modification. For example, modification of anomaly detection algorithm capability and the determination of support and similarity threshold all need further investigation.