The Impact of the Pattern-Growth Ordering on the Performances of Pattern Growth-Based Sequential Pattern Mining Algorithms

Sequential Pattern Mining is an efficient technique for discovering recurring structures or patterns from very large dataset widely addressed by the data mining community, with a very large field of applications, such as cross-marketing, DNA analysis, web log analysis, user behavior, sensor data, etc. The sequence pattern mining aims at extracting a set of attributes, shared across time among a large number of objects in a given database. Previous studies have developed two major classes of sequential pattern mining methods, namely, the candidate generation-and-test approach based on either vertical or horizontal data formats represented respectively by GSP and SPADE, and the pattern-growth approach represented by FreeSpan and PrefixSpan. In this paper, we are interested in the study of the impact of the pattern-growth ordering on the performances of pattern growth-based sequential pattern mining algorithms. To this end, we introduce a class of pattern-growth orderings, called linear orderings, for which patterns are grown by making grow either the current pattern prefix or the current pattern suffix from the same position at each growth-step. We study the problem of pruning and partitioning the search space following linear orderings. Experimentations show that the order in which patterns grow has a significant influence on the performances.


Introduction
A sequence database consists of sequences of ordered elements or events, recorded with or without a concrete notion of time.Sequences are common, occurring in any metric space that facilitates either partial or total ordering.Customer transactions, codons or nucleotides in an amino acid, website traversal, computer networks, DNA sequences and characters in a text string are examples of where the existence of sequences may be significant and where the detection of frequent (totally or partially ordered) subsequences might be useful.Sequential pattern mining has arisen as a technology to discover such subsequences.A subsequence, such as buying first a PC, then a digital camera, and then a memory card, if it occurs frequently in a customer transaction database, is a (frequent) sequential pattern.
In this paper, we are interested in the study of the impact of the pattern-growth ordering on the performances of pattern growth-based sequential pattern mining algorithms.It aims at enhancing understanding of the pattern-growth approach.To this end, the important key concepts upon which that approach relies, namely pattern-growth direction, pattern-growth ordering, search space pruning and search space partitioning, are revisited.We introduce a class of pattern-growth orderings, called linear orderings, for which patterns are grown by making grow either the current pattern prefix or the current pattern suffix from the same position at each growth-step.This class contains PrefixSpan (Pei et al., 2001;Pei et al., 2004) and involves both unidirectional and bidirectional growth.Thus, it is a generalization of PrefixSpan (Pei et al., 2001;Pei et al., 2004).However, it does not contain FreeSpan (Han et al., 2000) as it makes grow patterns from any position.We study the problem of pruning and partitioning the search space following linear orderings.Experimentations show that the order in which patterns grow has a significant influence on the performances.
The rest of the paper is organized as follows.Section 2 presents the formal definition of the problem of sequential pattern mining.Section 3 presents previous results.Section 4 presents the theoretical contribution of the paper.Section 5 presents experimental results.Concluding remarks are given in section 6.

Problem statement and Notation
The problem of mining sequential patterns, and its associated notation, can be given as follows: Let I={i 1 , i 2 , . . ., i n } be a set of literals, termed items, which comprise the alphabet.An itemset is a subset of items.A sequence is an ordered list of itemsets.Sequence s is denoted by s 1 , s 2 , ... s n , where s j is an itemset.s j is also called an element of the sequence, and denoted as (x 1 , x 2 ,...,x m ), where x k is an item.For brevity, the brackets are omitted if an element has only one item, i.e. element (x) is written as x.An item can occur at most once in an element of a sequence, but can occur multiple times in different elements of a sequence.The number of instances of items in a sequence is called the length of the sequence.A Sequence with length l is called an l-sequence.The length of a sequence α is denoted |α|.A sequence α= a 1 a 2 ...a n , is called subsequence of another sequence β=b 1 b 2 ... b m  and β a supersequence of α, denoted as α ⊆ β, if there exist integers 1 ≤j 1 < j 2 < ... < j n ≤ j m such that a 1 ⊆ b j1 , a 2 ⊆ b j2 , … a n ⊆ b jn .Symbol ε denotes the empty sequence.We are given a database S of input-sequences.A sequence database is a set of tuples of the form sid, s where sid is a sequence_id and s a sequence.A tuple sid, s is said to contain a sequence α if α is a subsequence of s.The support of a sequence α in a sequence database S is the number of tuples in the database containing α, i.e. support(S, α) = |{sid, s | sid, s ∈ S and α ⊆ s}|.
It can be denoted as support(α) if the sequence database is clear from the context.Given a user-specified positive integer denoted min_support, termed the minimum support or the support threshold, sequence α is called a sequential pattern in the sequence database S if support(S,α)≥ min_support.A sequential pattern with length l is called an l-pattern.Given a sequence database and the min_support threshold, sequential pattern mining is to find the complete set of sequential patterns in the database.

Related work
Sequential pattern mining is an important data mining problem.Since the first proposal of this data mining task and its associated efficient mining algorithms, there has been a growing number of researchers in the field and tremendous progress (Mabroukeh & Ezeife, 2010) has been made, evidenced by hundreds of follow-up research publications, on various kinds of extensions and applications, ranging from scalable data mining methodologies, to handling a wide diversity of data types, various extended mining tasks, and a variety of new applications.
The Apriori-based approach form the vast majority of algorithms proposed in the literature for sequential pattern mining.Apriori-like algorithms depend mainly on the Apriori anti-monotony property, which states the fact that any super-pattern of an infrequent pattern cannot be frequent, and are based on a candidate generation-and-test paradigm proposed in association rule mining (Agrawal et al., 1993;Agrawal & Srikant, 1994).This candidate generation-and-test paradigm is carried out by GSP (Agrawal & Srikant, 1995), SPADE (Zaki, 2001), andSPAM (Ayres et al., 2002).Mining algorithms derived from this approach are based on either vertical or horizontal data formats.Algorithms based on the vertical data format involve AprioriAll, AprioriSome and DynamicSome (Agrawal & Srikant, 1995), GSP (Agrawal & Srikant, 1995), PSP (Masseglia et al., 1998) and SPIRIT (Garofalakis et al., 1999), while those based on the horizontal data format involve SPADE (Zaki, 2001), cSPADE (Zaki, 2000), SPAM (Ayres et al., 2002), LAPIN-SPAM (Yang & Kitsuregawa, 2005), IBM (Savary & Zeitouni, 2005) and PRISM (Gouda et al., 2007;Gouda et al., 2010) .The generation-and-test paradigm has the disadvantage of repeatedly generating an explosive number of candidate sequences and scanning the database to maintain the support count information for these sequences during each iteration of the algorithm, which makes them computationally expensive.To increase the performance of these algorithms constraint driven discovery can be carried out.With constraint driven approaches systems should concentrate only on user specific or user interested patterns or user specified constraints such as minimum support, minimum gap or time interval etc.With regular expressions these constraints are studied in SPIRIT (Garofalakis et al., 1999).
To alleviate these problems, the pattern-growth approach, represented by FreeSpan (Han et al., 2000), PrefixSpan (Pei et al., 2001;Pei et al., 2004) and their further extensions, namely FS-Miner (El-Sayed et al., 2004), LAPIN (Hsieh et al., 2008 ;Yang et al., 2007), SLPMiner (Seno & Karypis, 2002) and WAP-mine (Pei et al., 2000), for efficient sequential pattern mining adopts a divide-and-conquer pattern growth paradigm as follows.Sequence databases are recursively projected into a set of smaller projected databases based on the current sequential patterns, and sequential patterns are grown in each projected database by exploring only locally frequent fragments (Han et al., 2000;Pei et al., 2004).The frequent pattern growth paradigm removes the need for the candidate generation and prune steps that occur in the Apriori-based algorithms and repeatedly narrows the search space by dividing a sequence database into a set of smaller projected databases, which are mined separately.The major advantage of projection-based sequential pattern-growth algorithms is that they avoid the candidate generation and prune steps that occur in the Apriori-based algorithms.Unlike Apriori-based algorithms, they grow longer sequential patterns from the shorter frequent ones.The major cost of these algorithms is the cost of forming projected databases recursively.To alleviate this problem, a pseudo-projection method is exploited to reduce this cost.Instead of performing physical projection, one can register the index (or identifier) of the corresponding sequence and the starting position of the projected suffix in the sequence.Then, a physical projection of a sequence is replaced by registering a sequence identifier and the projected position index point.Pseudo-projection reduces the cost of projection substantially when the projected database can fit in main memory.
PrefixSpan (Pei et al., 2001;Pei et al., 2004) and FreeSpan (Han et al., 2000) differ at the criteria of partitioning projected databases and at the criteria of growing patterns.FreeSpan (Han et al., 2000) creates projected databases based on the current set of frequent patterns without a particular ordering (i.e., pattern-growth direction), whereas PrefixSpan projects databases by growing frequent prefixes.Thus, PrefixSpan follows the unidirectional growth whereas FreeSpan follows the bidirectional growth.Another difference between FreeSpan and PrefixSpan is that the pseudo-projection works efficiently for PrefixSpan but not so for FreeSpan.This is because for PrefixSpan, an offset position clearly identifies the suffix and thus the projected subsequence.However, for FreeSpan, since the next step pattern-growth can be in both forward and backward directions from any position, one needs to register more information on the possible extension positions in order to identify the remainder of the projected subsequences.

Pattern-Growth Directions and Orderings
Definition 1 (Pattern-growth direction).A pattern-growth direction is a direction along which patterns could grow.There are two pattern-growth directions, namely left-to-right and right-to-left directions.Do grow a pattern along left-to-right (resp.right-to-left) direction is to add one or more item to its right (resp.left) hand side.
Definition 2 (Pattern-growth ordering).A pattern-growth ordering is a specification of the order in which patterns should grow.A pattern-growth ordering is said to be unidirectional iff all the patterns should grow along a unique direction.Otherwise it is said to be bidirectional.A pattern-growth ordering is said to be static (resp.dynamic) iff it is fully specified before the beginning of the mining process (resp.iff it is constructed during the mining process).

Definition 3 (Basic-static pattern-growth ordering).
A basic-static pattern-growth ordering, also called basic pattern-growth ordering for sake of simplicity, is an ordering which is based on a unique pattern-growth direction, and grow a pattern at the rate of one item per growth-step.
There are two basic-static pattern-growth orderings, namely left-to-right ordering (also called prefix-growth ordering), which consists in growing a prefix of a pattern at the rate of one item per growth-step at its right hand side, and right-to-left ordering (also called suffix-growth ordering), which consists in growing a suffix of a pattern at the rate of one item per growth-step at its left hand side.

Definition 4 (Basic-dynamic pattern-growth ordering).
A basic-dynamic pattern-growth ordering is an ordering which grow a pattern at the rate of one item per growth-step, and whose pattern-growth direction is determined at the beginning of each growth-step during the mining process.It is denoted * -growth.
Definition 5 (Basic-bidirectional pattern-growth ordering).A basic-bidirectional pattern-growth ordering is an ordering which is based on the two distinct pattern-growth directions, and grow a pattern in each direction at the rate of one item per couple of growth-steps.
There are two basic-bidirectional pattern-growth orderings, namely prefix-suffix-growth ordering (i.e.left-to-right direction followed by right-to-left direction), which consists in growing a pattern at the rate of one item per growth-step during a couple of steps by first growing a prefix (i.e.adding of one item at the right-hand side) of that pattern followed by the growing of the corresponding suffix (i.e.adding of one item at the left-hand side), and suffix-prefix-growth ordering (i.e.right-to-left direction followed by left-to-right direction), which consists in growing a pattern at the rate of one item per growth-step during a couple of steps by first growing a suffix of that pattern followed by the growing of the corresponding prefix.
Definition 6 (Linear pattern-growth ordering).A linear pattern-growth ordering is a series of compositions of * -growth, prefix-growth and suffix-growth orderings, and denoted o Otherwise, it is said to be dynamic.
The o 0 -o 1 -o 2 … o n-1 -growth linear ordering consists in growing a pattern at the rate of one item per growth-step during a series of n growth-steps by growing at step i (0 ≤ i ≤ n-1) a prefix (resp.suffix) of that pattern if o i denotes prefix (resp.suffix).If o i ∈{ * }, a pattern-growth direction is determined and an item is added to the pattern following that direction.For instance, stemming from the prefix-suffix-suffix-prefix-growth static linear ordering, one should grow a pattern in the following order: • Growth-step 0: Add an item to the right hand side of a prefix of that pattern.
• Growth-step 1: Add one item to the left hand side of the corresponding suffix of the previous prefix.
The prefix-suffix- * -prefix-growth dynamic linear ordering grows patterns as prefix-suffix-suffix-prefix-growth ordering except for steps k that satisfy (k mod 4) = 3.During such a particular step, a pattern-growth direction is determined and an item is added to the pattern following that direction.
FreeSpan and PrefixSpan differ at the criteria of growing patterns.FreeSpan creates projected databases based on the current set of frequent patterns without a particular ordering (i.e., pattern-growth direction).Since a length-k pattern may grow at any position, the search for length-(k+1) patterns will need to check every possible combination, which is costly.Because of this, FreeSpan do not follow the linear ordering.However PrefixSpan follows the prefix-growth static ordering as it projects databases by growing frequent prefixes.
Given a database of sequences, an open problem is to find a linear ordering that leads to the best mining performances over all possible linear orderings.

Search Space Pruning and Partitioning
Definition 7 (Prefix of an itemset).Suppose all the items within an itemset are listed alphabetically.Given an itemset Definition 8 (The corresponding suffix of a prefix of an itemset).Let x = (x 1 x 2 … x n ) be a itemset.Let x′ = (x 1 x 2 … x m ) (m ≤n) be a prefix of x.Itemset x″= (x m+1 x m+2 … x n ) is called the suffix of x with regards to prefix x′, denoted as x″ = x/x′.We also denote x = x′.x″.Note, if x = x′, the suffix of x with regards to x′ is empty.If 1 ≤m< n, the suffix is also denoted as (_x m+1 x m+2 … x n ).For example, for the itemset iset=(abcdefgh),(_efgh) is the suffix with regards to the prefix (abcd_), iset=(abcd_).(_efgh),(abcdef_) is the prefix with regards to suffix (_gh) and iset=(abcdef_).(_gh).
The following definition introduces the dot operator.It permits itemset concatenations and sequence concatenations.
Definition 9 ("." operator).Let e and e′ be two itemsets that do not contain the underscore symbol (_).Assume that all the items in e′ are alphabetically sorted after those in e.Let γ=e 1 … e n-1 a and μ=be′ 2 … e′ m  be two sequences, where e i and e′ i are itemsets that do not contain the underscore symbol, a ∈ {e, (_items in e), (items in e_), (_items in e_)} and b ∈{e′, (_items in e′), (items in e′_), (_items in e′_)}.The dot operator is defined as follows. 1.
e .e′ = ee′ Definition 10 (Prefix of a sequence) (Pei et al., 2004).Suppose all the items within an element are listed alphabetically.Given a sequence α=e 1 e 2 ... e n , a sequence β= e′ 1 e′ 2 … e′ m  (m ≤ n) is called a prefix of α if and only if 1) e′ I =e i for all i ≤ m-1; 2) e′ m ⊆ e m ; and 3) all the frequent items in e m -e′ m are alphabetically sorted after those in e′ m .If e′ m ≠∅ and e′ m ⊆ e m the prefix is also denoted as e′ 1 e′ 2 … e′ m-1 (items in e′ m _).
Definition 11 (The corresponding suffix of a prefix of a sequence) (Pei et al., 2004).Given a sequence α=e 1 e 2 … e n .Let β=e 1 e 2 … e m-1 e′ m  (m ≤ n) be a prefix of α.Sequence γ=e″ m e m+1 … e n  is called the suffix of α with regards to prefix β, denoted as γ= α/β, where e″ m = e m -e′ m .We also denote α=β.γ.Note, if β=α, the suffix of α with regards to β is empty.If e″ m is not empty, the suffix is also denoted as (_items in e″ m ) e m+1 … e n .
For example, for the sequence s=a(abc)(ac)(efgh), (ac)(efgh) is the suffix with regards to the prefix a(abc), (_bc)(ac)(efgh) is the suffix with regards to the prefix aa, (_c)(ac)(efgh) is the suffix with regards to the prefix a(ab), and a(abc)(a_) is the prefix with regards to the suffix (_c)(efgh).
Definition 13 (Extension of the "." operator).Let S be a sequence database and let α be a sequence that may contain the underscore symbol (_).The dot operator is extended as follows.We have α.S={sid,α.s| sid,s ∈ S} and S.α ={sid,s.α| sid,s ∈ S}.
Corollary 1 (Associatively of the "." operator).The dot operator is associative, i.e. given a sequence database S and three sequences α, α′and α″ that may contain the underscore symbol (_), we have: We have the following lemmas.
Lemma 1 (The support of z in S α,α′ is that of its counterpart in S).Given a sequence database S and two sequences α and α′, for any sequence y prefixed with α and suffixed with α′, i.e. y=α.z.α′ for some sequence z, we have support(S,y)=support(S α,α′ ,z).
Let's prove that function f is injective.Consider two tuples of S, sid, y and sid′,y′, each having a canonical decomposition following prefix α and suffix α′.
Lemma 2 (What does set α.patterns(S α,α′ ).α′ denote for patterns(S) ?).The complete set of sequential patterns of S which are prefixed with α and suffixed with α′ is equal to α.patterns(S α,α′ ).α′,where function patterns denotes the complete set of sequential patterns of its unique argument.
Proof.Let x be a sequence.Assume that x ∈ α.patterns(S α,α′ ).α′.This means that x=α.z.α′ for some z ∈ patterns(S α,α′ ).From lemma 1, we have support(S α,α′ ,z) = support(S,α.z.α′).It comes that, x is also a sequential pattern in S as z is a sequential pattern in S α,α′ .Thus, α.patterns(S α,α′ ).α′ is included in the set of sequential patterns of S which are prefixed with α and suffixed with α′.Now, assume that x is a sequential pattern of S which is prefixed with α and suffixed with α′.We have x=α.z.α′ for some sequence z.From lemma 1, we have support(S α,α′ ,z)=support(S, α.z.α′).It comes that, z is also a sequential pattern in S α,α′ as x is a sequential pattern in S.This means that z ∈ patterns(S α,α′ ).Thus, the complete set of sequential patterns of S which are prefixed with α and suffixed with α′is included in α.patterns(S α,α′ ).α′.
Case 2: k=m.This means that γ=e′ 1 … e′ m-1 γ m _ and μ=_μ m .We have _μ m ≠∅ as μ≠ε.We also have γ m _ ≠e′ m as the contrary, i.e. γ m _ = e′ m , implies that μ= ε.If γ m _ = ∅, _μ m =e′ m and it comes that γ=e′ 1 … e′ m-1  and μ=e′ m , which corresponds to the first half of the claim of the lemma.Otherwise, we have γ m _ ≠∅ and μ m ≠∅, which leads to the second half of the claim of the lemma.
Case 3: k≠1, k≠m and γ k _=∅.This implies that μ k _=e′ k .It comes that γ=e′ 1 … e′ k-1  and μ=e′ k … e′ m , which corresponds to the first half of the claim of the lemma.
Case 4: k≠1, k≠m and _μ k =∅.This case is similar to case 3. We have γ k _=e′ k .This implies that γ=e′ 1 … e′ k  and μ=e′ k+1 … e′ m , which corresponds to the first half of the claim of the lemma.
Case 5: k≠1, k ≠ m, γ k _ ≠∅ and _μ k ≠∅.This leads to the second half of the claim of the lemma. Definition 14 (Static and dynamic search-space partitioning).A search space partition is said to be static iff it is fully specified before the beginning of the mining process.It is said to be dynamic iff it is constructed during the mining process.
Lemma 4 (Search-space partitioning based on prefix and/or suffix).We have the following.
1. Let {x 1 , x 2 , … , x n } be the complete set of length-1 sequential patterns in a sequence database S. The complete set of sequential patterns in S can be divided into n disjoint subsets in two different ways: a. Prefix-item-based search-space partitioning (Pei et al., 2004): The i-th subset (1≤ i ≤n) is the set of sequential patterns with prefix x i .
b. Suffix-item-based search-space partitioning (Pei et al., 2004): The i-th subset (1≤ i ≤n) is the set of sequential patterns with suffix x i .
2. Let α be a length-l sequential pattern and {β 1 , β 2 , … ,β p } be the set of all length-(l+1) sequential patterns with prefix α.Let α′ be a length-l′ sequential pattern and {γ 1 , γ 2 , … ,γ q } be the set of all length-(l′+1) sequential patterns with suffix α′.We have: a. Prefix-based search-space partitioning (Pei et al., 2004): The complete set of sequential patterns with prefix α, except for α itself, can be divided into p disjoint subsets.The i-th subset (1≤i≤p) is the set of sequential patterns prefixed with β i .
b. Suffix-based search-space partitioning (Pei et al., 2004): The complete set of sequential patterns with suffix α′, except for α′ itself, can be divided into q disjoint subsets.The j-th subset (1≤j≤q) is the set of sequential patterns suffixed with γ j .
c. Prefix-suffix-based search-space partitioning: The complete set of sequential patterns with prefix α and suffix α′, and of length greater or equal to l+l′+1, can be divided into p or q disjoint subsets.
In the first partition, the i-th subset (1≤ i ≤p) is the set of sequential patterns prefixed with β i and suffixed with α′.In the second partition, the j-th subset (1≤j≤q) is the set of sequential patterns prefixed with α and suffixed with γ j .
Proof.Parts (1.a) and (2.a) of the lemma are proven in (Pei et al., 2004).The proof of parts (1.b) and (2.b) of the lemma is similar to the proof of parts (1.a) and (2.a).Thus, we only show the correctness of part (2.c).
Let μ be a sequential pattern of length greater or equal to l+l′+1, with prefix α and with suffix α′, where α is of length l and α′ is of length l′.The length-(l+1) prefix of μ is a sequential pattern according to an Apriori principle which states that a subsequence of a sequential pattern is also a sequential pattern.Furthermore, α is a prefix of the length-(l+1) prefix of μ, according to the definition of prefix.This implies that there exists some i (1≤i≤p) such that β i is the length-(l+1) prefix of μ.Thus μ is in the i-th subset of the first partition.On the other hand, since the length-k prefix of a sequence is unique, the subsets are disjoint and this implies that μ belongs to only one determined subset.Thus, we have (2.c) for the first partition.The proof of (2.c) for the second partition is similar.Therefore we have the lemma. Corollary 2 (Partitioning S with sets x i .patterns(S xi,ε ) and patterns(S ε,xi ).x i ).Let {x 1 , x 2 , … , x n } be the complete set of length-1 sequential patterns in a sequence database S. The complete set of sequential patterns in S can be divided into n disjoint subsets in two different ways: 1. Prefix-item-based search-space partitioning: The i-th subset (1≤i ≤n) is x i .patterns(Sxi,ε ), where function patterns denotes the set of sequential patterns of its unique argument.
Proof.According to part 1.(a) of lemma 4, the i-th subset is the set of sequential patterns which are prefixed with x i .From lemma 2, this subset is x i .patterns(Sxi,ε ).Similarly, according to part 1.(b) of lemma 4, the i-th subset is the set of sequential patterns suffixed with x i .From lemma 2, this subset is patterns(S ε,xi ).x i . Lemma 5 (A linear ordering induces a recursive pruning and partitioning).A linear ordering induces a recursive pruning and partitioning of the search space.The recursive partitioning is static if the linear ordering is static and dynamic otherwise.
Proof.Let us consider the initial sequence database S, two integer numbers l and l′, a length-l sequential pattern α, a length-l′ sequential pattern α′, and a linear ordering L 0 =o 0 -o 1 -o 2 … o n-1 -growth.Note that ε.S ε,ε .ε=S is the starting database of the recursive pruning and partitioning of the search space.In the following, we show how L 0 induces a recursive pruning and partitioning of α.S α,α′ .α′.
Case 1: o 0 ∈ {prefix}.Let {β 1 .α′,β 2 .α′,… , β p .α′} be the set of all length-(l+l′+1) sequential patterns with respect to database α.S α,α′ .α′,prefixed with α and suffixed with α′.From lemma 3, either β I =α.(x i ) or β I =α.(_x i ), where x i is an item and 1≤i≤p.This implies that X={x 1 , x 2 , … , x p } is the complete set of length-1 sequential patterns with respect to database S α,α′ .It comes that any item that does not belong to X is not frequent with respect to S α,α′ .Thus, any sequence that contains an item that does not belong to X is not frequent with respect to S α,α′ according to an Apriori principle which states that any supersequence of an infrequent sequence is also infrequent.Because of this, all the infrequent items with respect to S α,α′ are removed from the z part (also called the middle part) of all sequence α.z.α′∈α.S α,α′ .α′.This pruning step leads to a new sequence database α.S′ α,α′ .α′whose middle parts of sequences do not contain infrequent items with respect to S α,α′ .Then, α.S′ α,α′ .α′ is partitioned according to part (2.c) of lemma 4. The i-th sub-database (1≤i≤p) of α.S′ α,α′ .α′,denoted α.x i .S′ α.xi,α′ .α′, is the set of subsequences of α.S′ α,α′ .α′with prefix β i =α.x i and with suffix α′.Each sub-database is in turn recursively pruned and partitioned according to L 1 =o 1 -o 2 … o n-1 -growth linear ordering.
Case 3: o 0 ∈ {*}.A pattern-growth direction is determined during the mining process.Then, α.S α,α′ .α′ is recursively pruned and partitioned as in case 1 if the determined direction is left-to-right and as in case 2 otherwise.From definitions 6 and 14 it is easy to see that the recursive partitioning is static if the linear ordering is static and dynamic otherwise.

Experimental results
The data set used here is collected from the webpage of SPMF software (Fournier-Viger et al., 2014).This webpage (http://www.philippe-fournier-viger.com/spmf/index.php)provides large data sets in SPMF format that are often used in the data mining literature for evaluating and comparing algorithm performance.

Figure
Figure 3. P The rig