Exact Simulation for Fork-Join Networks with Heterogeneous Service

This paper considers a fork-join network with a group of heterogeneous servers in each service station, e.g. servers having different service rate. The main research interests are the properties of such fork-join networks in equilibrium, such as distributions of response times, maximum queue lengths and load carried by servers. This paper uses exact Monte-Carlo simulation methods to estimate the characteristics of heterogeneous fork-join networks in equilibrium, for which no explicit formulas are available. The algorithm developed is based on coupling from the past. The efficiency of the sampling algorithm is shown theoretically and via simulation.


Introduction
A fork-join network consists of K parallel service stations, where each incoming job is split into K subtasks at the fork station and processed separately in each service station. When all the K subtasks of a job are completed, they will be joined immediately at the join station and leave the system. Such fork-join networks can be used to model manufacturing systems or computing systems (see Ko and Serfozo (2004), Lebrecht and Knottenbelt (2007) and Dai (2011)).
A typical fork-join network is shown in Figure 1, where each service station has waiting capacity N (the number of task-waiting places). The ith service station has s i servers where the jth server has exponential service times with rate µ i j . If µ i j = µ i , ∀ j we say that the network has homogeneous service; if µ i j , j = 1, · · · , s i are not all the same we say that the network has heterogeneous service. Heterogeneity of service is a common feature of many real multi-server queueing situations. It allows tasks to receive different quality of service. The heterogeneous service mechanisms are clearly valid for almost any manufacturing system. Detailed discussions about heterogeneous service can be found in Kumar and Madheswari (2007). However, heterogeneous service were rarely treated in queuing theory (Kumar and Madheswari, 2007), especially for fork-join networks. The difficulty is that the stationary distribution of the fork-join network, even if the network is homogeneous, is intractable Ko and Serfozo (2004). The networks with heterogeneous servers are much more difficult to deal with than the networks with homogeneous servers. This is because for heterogeneous servers when more than one server is available the waiting task chooses which server to occupy must be specified, which makes the Markov model to be multi-dimensional instead of one-dimensional (see discussions for queuing models with heterogeneous servers in Cooper (1976).
For the simplest fork-join network having two service stations with each having a single server, Flatto and Hahn (1984) and Flatto (1985) derived the generating functions of the queue-length distribution. For more general forkjoin networks with homogeneous service, existing methods focus on finding approximations or bounds of mean response times and approximations of maximum queue-length distributions with N = ∞ (infinite waiting capacity), such as Nelson and Tantawi (1988), Baccelli et al. (1989), Balsoma et al. (1998), Raghavan and Viswanadham (2001) and Ko and Serfozo (2004).
The above methods focused on analytical approximations. However simulation results in Ko and Serfozo (2004) demonstrate that the accuracy of such approximations may be poor in some cases. Dai (2011) considers the use of exact Monte Carlo simulations, based on coupling from the past (CFTP) (Propp and Wilson, 1996), to estimate the distributions of response time and queue length for homogeneous fork-join networks with N < ∞. Comparing to analytical approximations one advantage of the exact Monte Carlo simulation methods is that the accuracy is controlled by the number of simulated samples. Note that one may consider using approximate simulations (for example Markov chain Monte Carlo methods) which draw samples approximately from equilibrium. However it is very difficult to justify the quality of simulated samples via approximate simulation. On the contrary the exact Monte Carlo simulation methods, which can draw samples exactly from the target distribution (perfect samples), are preferable in practice. The other advantage of the exact Monte Carlo simulation methods is that it can provide empirical distribution estimates for response time and maximum queue length. These empirical estimates can further provide all characteristics of these distributions such as mean, median and quartiles. Existing analytical approximations, however, focus on the mean response time approximations which cannot catch much information about the distribution of response time. For example Dai (2011) showed that the distribution of response time has a long tail and is highly skewed. This paper considers stationary distributions of heterogeneous fork-join networks with N < ∞, for which no analytical formulas are available. We consider different strategies to allocate subtasks to idle servers when more than one server is available, 1) a subtask chooses the fastest server (fastest strategy); 2) a subtask chooses the slowest server (slowest strategy); 3) a subtask chooses its server at random from all those idle (random strategy). We propose exact simulation methods, based on CFTP, to generate exact realisations from the equilibrium of such networks. Based on the simulated realisations we provide Monte Carlo estimates for load carried by servers, the distributions of maximum queue lengths and response times. This paper is organized as follows. In Section 2, we introduce model notations. Then in Section 3 and Section 4 we develop a CFTP algorithm with bounding chains to simulate exact realisations from the equilibrium of heterogeneous fork-join networks with different allocation strategies. Complexities of the algorithms and simulation studies are provided in Section 5. Section 6 provides a discussion.

Notations
Consider the fork-join network in Figure 1. Jobs arrive at the system according to a Poisson process with rate λ. Each incoming job is split into K subtasks which are simultaneously assigned to K parallel stations for processing. The ith station has s i servers in which the service rate for the jth server is µ i j . When all the K subtasks of a job are completed, they will be joined immediately at the join station (the service time at the join station is 0) and leave the system. Service station i can hold N i = N + s i subtasks at most. Denote N = (N 1 , · · · , N K ). If the subtasks of a job are not all completed, the completed subtasks will wait in the corresponding buffer of the join station. We also assume that the join station has enough places to hold all completed waiting subtasks.
Homogeneous fork-join networks (µ i j = µ i , ∀ j = 1, · · · , s i ) can be represented as a K-dimensional continuous-www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 1; time Markov chain Q(t) = (Q 1 (t), · · · , Q K (t)), where Q i (t) means the number of subtasks at service station i at time t. However, when µ i j s are different, i.e. each server works at its own characteristic rate, the K-dimensional continuous time Markov chain Q(t) cannot totally represent the network. For example, Q i (t) = 1 only means that there is one subtask at service in station i but we do not know which server is active. For such heterogeneous networks the service rate in station i at time t is ∑ s i j=1 µ i j R i j (t), where R i j (t) = 0 if the server is idle and R i j (t) = 1 is the server is busy at time t (Cooper 1976). Thus the heterogeneous fork-join network can be represented by R i j (t), j = 1, · · · , s i , i = 1, · · · , K together with R i,s i +1 (t), the number of subtasks waiting in the queue.
. The fork-join network can then be represented by a Markov process R(t) = (R T 1 (t), · · · , R T K (t)) T . If we denote the state space of R(t) as R, then its element is a two-dimensional array with K rows and the ith row has s i + 1 elements. The last element R i,s i +1 (t), the number of tasks waiting in the queue of station i, takes integers from 0 to N. Other elements R i1 (t), · · · , R i,s i (t) take values 0 or 1 which means the corresponding server is idle or active. If some R im (t) = 0 for m ∈ {1, · · · , s i } then R i,s i +1 (t) must be 0.

Allocation Strategies
If we label the servers from 1 (the fastest) to s i (the slowest), the fastest strategy (a subtask chooses the fastest idle server) can be viewed as that a newly arrived subtask goes to the idle server with the smallest label or will wait in a queue if all servers are busy. Similarly for the slowest strategy with slowest server labelled as 1 and the fastest labelled as s i . Denote R as the sate space of the network and suppose that the current state of the network is r ∈ R. The transitions of a new job arrival under the fastest/slowest strategy can be denoted as r ′ = r ⊕ 1, where 1 is a K-dimensional vector with all elements equal to 1 and if j is the smallest server label such that r i j = 0; r i j + 1 if j = s i + 1 and r im > 0 for all m ∈ {1, · · · , s i }; r i j for other label j. (1) For the random strategy a subtask chooses its server at random from all those idle. We denote the transitions of a new job arrival under the random strategy as r ′ = r ⊎ 1, which is defined as r ′ i = r i ⊎ 1 for all i and if j is chosen randomly from those such that r i j = 0, j ≤ s i ; r i j + 1 if j = s i + 1 and r im > 0 for all m ∈ {1, · · · , s i }; r i j for other label j. (2)

Partial Order
We consider the following partial orders for the state space, which will be used later when we develop CFTP algorithms. For r, v ∈ R, define r ≺ v if r i j ≤ v i j , ∀i, j and r i j < v i j for at least a pair of (i, j).
Note that for r ∈ R, its elements r i j , j = 1, · · · , s i , i = 1, · · · , K are bounded since they are all 0, 1 values. But r i,s i +1 , i = 1, · · · , K can take values from 0 to N which represent the number of waiting tasks. Thus when N = ∞ there is no maximum element in R.

Exact Simulation for Heterogeneous Networks with the Fastest/Slowest Strategy
We assume throughout this paper that λ < ∑ s i j=1 µ i j for all i = 1, · · · , K. We can simulate R(t) exactly, via the CFTP algorithm Propp and Wilson (1996), which runs Markov chains starting at all different states from the past and if all chains coalesce before time 0 we collect the sample at time 0 which is exactly from equilibrium. Such CFTP algorithm is not practical since it requires heavy computation to monitoring coalescence of all different chains. Propp and Wilson (1996) also proposed the monotone CFTP algorithm. If the Markov chains preserve some partial order we only need to run two chains, the upper chain from the maximum state and the lower chain from the minimum state. All chains are bounded by them and if the two chains coalesce all other chains coalesce. Such monotone CFTP algorithm is much more efficient than the simple CFTP. When N = ∞, for the partial order defined in Section 2.3 there is no maximum element in R since r i,s i +1 ranges from 0 to N. Therefore the monotone CFTP algorithm is not readily available. It is very challenging to deal with N = ∞ although it might be possible to use www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 1; the dominated CFTP method in Kendall and Moller (2000). We will provide a discussion on N = ∞ in the end of this paper, but here we simply assume that N < ∞, which is reasonable in practice. Then the elements in R are bounded. However, with N < ∞ the partial order defined in Section 2.3 is not always preserved by the Markov process R(t), which is shown later in this section. Therefore the monotone CFTP algorithm is still not available for N < ∞. But since R is bounded, we can use CFTP with bounding chains. Examples of CFTP with bounding chains can be found in Dai (2008), Dai (2011) and Huber (2004).

Continuous Time Markov Chain Simulation for R(t)
The transition rate for R(t) from a state r to another state v is given by where e i j has the same dimensions as the element in R and it has all elements equal to 0 except that the jth element in the ith row is equal to 1. Note that in (3) the transition v = r ⊕ 1 means a new job arrives, thus the transition rate is λ. Also note that in (3) In the first line of equation (4) v km = max{0, r km − 1} means that either a subtask is completed or no transitions at all. This can be explained as follows. In the first line the condition r i,s i +1 = 0 means that no subtask is waiting in the queue of station i. Thus r i j = 1 and v i j = 0 correspond to the subtask in the jth server of the ith station is completed. In addition r i j = 0 and v i j = 0 correspond to no transitions. The corresponding transition rate is r i j µ i j in (3), which is µ i j if r i j = 1 and is 0 if r i j = 0.
In the second line of equation (4), r i,s i +1 > 0 means that there are subtasks waiting in the queue of station i. Thus r im = 1 for all m ∈ {1, · · · , s i } (all servers are active in station i). If the subtask in any server, say server m, of the ith station is completed then a subtask in the queue occupies the mth server immediately and the transition is from r i,s i +1 to v i,s i +1 = r i,s i +1 − 1, which has transition rate ∑ s i m=1 r im µ im in (3). To simulate R(t), we need to simulate the holding time, the amount of time that the continuous Markov chain R(t) spends in the current state r, and simulate transitions. From (3) we know that the holding time is exponentially distributed with rate At the end of the holding time, R(t) jumps to another state v with transition probability It is straightforward to simulate holding times and transitions according to (5) and (6). Suppose the current time is τ l and current state of R(t) is r. We simulate random numbers Ross (2007) for more details about the properties of exponential distributions). CFTP algorithm in Propp and Wilson (1996) needs to monitor coalescence of all different Markov chains. With the above simulation method, different chains have different holding times T l , which makes it extremely difficult to monitor coalescence. Thus in order to develop a CFTP algorithm we consider a thinning algorithm in the following section instead.

A Thinning Algorithm
We consider possible holding times, defined as T min l = min{ξ (l) 0 , ξ (l) i j , j = 1, · · · , s i , i = 1, · · · , K}, which is the same for all different Markov chains. This means that we will consider transitions at the same time τ l+1 = T min l + τ l for all different chains. This makes it possible to monitor coalescence. The time points τ = {· · · , τ l , τ l+1 , · · · } are possible transition times of R(t). The thinning algorithm can select transition times of R(t) from τ according to probability e r /(λ + ∑ i ∑ s i j=1 µ i j ). Proofs of such thinning algorithms can be found in Ross (2007). The algorithm includes two parts. One is the case when new jobs can enter the network (there are waiting places available), the other is the case when new jobs cannot enter the network (no waiting places available).
will be the holding time and we simulate transitions at τ l+1 = T min l + τ l ; if T min l < T l then the holding time is greater than T min l and there is no transition at τ l+1 .
So the transitions are correctly simulated according to (6). Case B. Now consider that the current state r is such that some r i,s i +1 = N. This means that new jobs cannot enter the network since no waiting place is available. Since P(T min 0 then T min l will be the holding time. We then simulate transitions at τ l+1 = T min l + τ l , which can be done similarly as Case A. If T min l < T l then the holding time is greater than T min l and there is no transition at 0 then there is no transition at τ l+1 since the new job cannot enter the network due to no waiting place available. The above method tells us how to simulate R(t) between τ l and τ l+1 . Due to the memoryless property of exponential distribution we can repeat the whole procedure to simulate R(t) after τ l+1 . Above all, the thinning method correctly simulates R(t) according to the transition rate function (3).

Updating Functions and the Partial Order
The updating rules for R(t) based on the thinning method in previous section can be denoted as a deterministic function ϕ of the random numbers U (l) = {ξ (l) 0 , ξ (l) i j , j = 1, · · · , s i , i = 1, · · · , K}, The following theorem tells us that in some cases the partial order between a lower chain R(t) and an upper chain R ′ (t) is preserved by the updating function ϕ.
Suppose that the current time is τ l and that R(τ l ) = r ≼ R ′ (τ l ) = r ′ .
Condition 3.1 The upper chain state r ′ is such that r ′ i,s i +1 < N for all i. Theorem 3.1 If Condition 3.1 is true (there are waiting places free for a new job in the upper chain R ′ (t)) then for any chain v(t) such that R(τ l ) = r ≼ v(τ l ) ≼ R ′ (τ l ) = r ′ we have that given the exponential random numbers U (l) , for 0 < a ≤ T min l . If Condition 3.1 is not true, i.e. when a new job arrives the upper chain cannot increase due to no waiting place available, the partial order of chains v(t) between R(t) and R ′ (t) may not be preserved since the lower chain may still increase. Although the partial order is not always preserved, we can still find two bounding chains, an upperbound chainR(t) and a lower-bound chain R(t), which bound all Markov chains. When the two bounding chains coalesce, all Markov chains coalesce. This is shown in the following section.

CFTP with Bounding Chains
Define an upper-bound chainR(t) and a lower-bound chain R(t), which are updated simultaneously as follows.
The upper chain updating function is for other label j.
The lower-bound chain updating function is The following theorem holds for such upper-bound and lower-bound chains.
Theorem 3.2 Suppose that the upper-bound chain and lower-bound chain are define with update functions (9) and (11) respectively as above. If R(t) ≼ R(t) ≼R(t) then R(t + a) ≼ R(t + a) ≼R(t + a) for a > 0.
Define the maximum point asr which is such thatr i j = 1, j = 1, · · · , s i andr i,s i +1 = N. Define the minimum point as r with all elements equal to 0. Suppose that the upper-bound chainR (

Perfect Sampling for Heterogeneous Networks with the Random Strategy
For random allocation strategy the transition rate function for R(t) becomes where ⊎ is defined in (2).
We can still use formula (5)  with transition probability which is similar to (6). The only difference is that in the above equation we use ⊎ instead of ⊕ when a new job arrives. Thus we can use a similar thinning algorithm as that in Section 3.2 to simulate the Markov chains, except that when a new job arrives random allocation (operator ⊎) need to be considered.
At time τ l apart from the simulated random number U (l) , we also need to simulate the random allocation '⊎' of the new job. We simulate a sequence of random integers for each station i, denoted as η (l) ig is taken uniformly from the server labels {1, · · · , s i } of station i. Suppose the current state is R(τ l ) = r. If r is such that at least one r i j = 0, j = 1, · · · , s i then in the sequence η (l) i , Y (l) ig * is the label of the randomly chosen server for the new subtask, where g * = min{g : such that r i,Y (l) ig = 0}. Denote this randomly chosen label as a function, g * = J(η (l) i , r i ). Such a method is a rejection sampling method, which draws a server randomly from {1, · · · , s i } and accepts it if this server is free. Then following (2), r ′ i = r i ⊎ 1 can be further written as Such a rejection sampling method guarantees that the partial order is preserved by ⊎ under some condition. This is given by the following lemma.
With the definition of random allocation ⊎, the updating function for R(t) can be written as Define the upper-bound chain and lower-bound chain updating functions as follows. The upper chain updating function isR ; if at least one r im = 0 for m ∈ {1, · · · , s i }; min{N, r i j + 1} j = s i + 1; if r im > 0 for all m ∈ {1, · · · , s i }; r i j for other label j.
www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 1; The lower-bound chain updating function is Theorem 4.1 Suppose that the upper-bound chain and lower-bound chain are define with update functions (16) and (18) respectively as above. For R(t) updated according to (15) With Theorem 4.1 we can also use CFTP with bounding chains to simulate R(t) from equilibrium with the random allocation strategy.

Algorithm Complexity
In the CFTP algorithms we only need to simulate transitions at discrete time points τ = {· · · , τ l , τ l+1 , · · · }, although the upper-bound and lower-bound chains are continuous time chains. Thus to monitor coalescence we need to consider coalescence of the discrete time chainsR (l) =R(τ l ) and R (l) = R(τ l ). Let ν c be the number of steps needed for R (l) and R (l) to coalesce. Define the distance betweenR (l) and Coalescent occurs when the distance becomes 0. We know that in the CFTP algorithmsR (l) starts from the maximum and R (l) starts from the minimum, which have distance N = ∑ K i=1 N i . We will expect that the larger the value of N the longer the coalescent time. In this section we will show that the expected coalescence steps Eν c is bounded by a polynomial function of N, which means that the running time of the CFTP algorithm only increases polynomially as N increases. This implies the algorithm is very efficient.

According to
to calculate a bound for Eν c , we need to work out a bound for E Following from the updating rules for the upper-bound and lower-bound chains, from step l − 1 to step l we have three possible cases: event A (l) 1 for distance not changing, d 3 corresponds to that when a new job arrives the upper-bound chain increases but the lower-bound chain does not move. Following the proof of Lemma 4 in Dai (2011) we have that the probability P(A (l) 3 ) converges to 0 at an exponential rate as l → ∞, provided that λ < ∑ j µ i j for i = 1, · · · , K. The rate of P(A (l) 3 ) only depends on the network parameters λ and µ i j . On the other hand event A (l) 2 corresponds to that a subtask is completed in a server in the upper-bound chain but no subtask is completed in the lower-bound chain since the corresponding server is idle in the lower-bound chain. Thus the probability P( 1 corresponds to all other possibilities. With the above arguments, we have www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 1; Since P(A (l) 3 ) converges to 0 at an exponential rate, there exist an large constant H (only depends on λ and µ i j ) and when l ≥ H we have P( Then (19) and (21) imply that Therefore the expected coalescence time is bounded by a polynomial function of N.
The running time comparisons for different values of K and N are summarized in Table 1. The simulations are performed on a desktop with a 2.13 GHz Intel Core Duo processor and 3.2G memory. We can see that the algorithm is very efficient.
Note that the above proof is correct for all the three allocation strategies. This is because the coalescence steps only depends on the probabilities that the distance between the two bounding chains changes (increasing, decreasing or not changing), which does not depend on the allocation strategies. Therefore the running time for exact Monte Carlo simulation algorithm of the three allocation strategies should be similar. Of course, the algorithm for the random allocation strategy should take slightly longer as it involves the extra rejection sampling steps. Table 1 demonstrates that the algorithm for the fastest strategy and slowest strategy have similar running times, but the algorithm for the random allocation strategy has longer running times. Table 1: Running time comparisons for simulating 10,000 realisations from a fork-join network with λ = 1, s i = 3, i = 1, · · · , K and service rates are 0.5, 1.0, 1.5 for the three servers in each station. I: the slowest strategy; II: the fastest strategy; III: the random strategy .  I  II  III  N  500  1000  1500  500  1000  1500  500  1000 1500 K = 5 278s 560s 1090s 281s 569s 1068s 318s 641s 1298s K = 10 822ss 1617s 3241s 773s 1666s 2945s 906s 1751s 3489s K = 15 1431s 2908s 6048s 1501s 2862s 6060s 1677s 3370s 6664s

A Toy Example
Before introducing comprehensive simulation studies, we provide a toy example to illustrate how to carry out the CFTP method. We consider the simplest case with one station (no fork-join but a simple queuing system), which has two servers with service rates µ 11 and µ 12 respectively. For simplicity, we assume that there is no waiting space. Therefore the system can take 2 jobs at most by the two servers. Therefore, the network has the following four possible states (1, 1), (1, 0) and (0, 1) and (0, 0), where 0 means the corresponding server is idle and 1 means it is busy. When using CFTP, we should run four Markov chains (starting from the four different states) simultaneously from a starting time say −M in the past, using the same random numbers.

Simulation Studies
In this section, we carried out simulation studies for distributions of maximum queue lengths, work loads for each server and response times.
Let R i (t) be the number of uncompleted subtasks (including jobs at service or waiting) in station i. Then R(t) = max i R i (t) represents the number of uncompleted jobs in the network. We are interested in the distribution of R(t) in equilibrium. Consider a network with K = 3 stations and each station has 4 (s 1 = s 2 = s 3 = 4) servers. The servers in Station One have service rates 0.2, 0.4, 0.6 and 0.8 respectively. The servers in Station Two have service rates 0.3, 0.6, 0.9 and 1.2 respectively. Station Three is the same as Station Two. The Monte Carlo simulation results for the maximum queue length distribution are summarized in Table 2.  From the results we can see that N = 100 and N = 300 provide similar results. This is because the maximum queue length is almost no more than 10 and waiting capacity 100 is large enough to hold all waiting jobs in equilibrium.
The results also show that on average the network in equilibrium under the slowest strategy will have the most jobs, the network in equilibrium under the fastest strategy will have the least jobs.
For the above scenario we also consider the work load analysis for each server. The results are summarized in Table 3, where the entries are the probabilities of the corresponding server being active. Again the workloads for each server are similar under N = 100 and N = 300, but the workloads are very different under different allocation strategies. Servers with smaller service rate have larger workload under the slowest strategy than under the fastest strategy, which is as we expected. Note that in Table 3 we only provided workloads for servers in Station One and Station Two since Station Three is the same as Station Two and they have similar results.
Using the simulated realisations of queue lengths from equilibrium, we can easily simulate response times. For the above scenario with N = 100, based on 10,000 simulations from equilibrium, Figure 2 provides the fitted density curves for response times under different allocation strategies. The mean response times under the three strategies are 2.18, 3.53 and 4.14. We can see that the distribution of response times are highly skewed and the distribution under the slowest strategy has a much longer tail than the others. By simply looking at the mean response times we do not have any information about the properties of the distribution tails, which represent how extreme the response time could be. The 90th-percentiles for the response time distribution under different allocation strategies are estimated as 4.12, 7.08 and 8.76. This suggests that under the slowest strategy the 10% longest jobs would take

Discussion
We presented a perfect sampling method based on CFTP to draw exact realisations from the equilibrium of a fork-join network with heterogeneous service. The proposed method is very important as there is no analytical method available for such problems. Similarly as that in Dai (2011) the methods only work for fork-join network with finite capacity N ≤ ∞. When N = ∞ there is no maximum point in the state space, therefore such CFTP with bounding chains is not readily available. Although N < ∞ is reasonable in practice, theoretically it is worth looking for methods for N = ∞.
A feasible method to solve the problem with N = ∞ is to use the dominated CFTP in Kendall and Moller (2000). The idea requires to finding a dominating process which is reversible, bounds the target chains and can be simulated easily from its equilibrium in reverse time. In terms of the process R(t) in this paper, R i,s i +1 (t), i = 1, · · · , K are not bounded. One possible way to apply dominated CFTP is to find a univariate dominating process X(t) to bound max i R i,s i +1 (t). Note that we can easily find a univariate process X i (t) (for example the birth-death process with birth rate λ and death rate min j µ i j ) to bound R i,s i +1 (t). However if X i (t), i = 1, · · · , K are independent then X(t) = max i X i (t) may not bound max i R i,s i +1 (t). This is because when X i (t) increases by 1 (a new job arrives) = ξ (l) 0 andR i,s i +1 (τ l ) = N we use⊕ instead of ⊕, i.e. R(τ l+1 ) =R(τ l )⊕1. WhenR i,s i +1 (τ l ) = N, the chains R(t) may be either such that R i,s i +1 (τ l ) = N or R i,s i +1 (τ l ) < N. (a) If T min l = ξ (l) 0 and R i,s i +1 (τ l ) = N for some i then R(τ l+1 ) = R(τ l ) ≼R(τ l ) ≼R(τ l+1 ) =R(τ l )⊕1. The partial order is preserved. (b) If T min l = ξ (l) 0 and R i,s i +1 (τ l ) < N for all i then R(τ l+1 ) = R(τ l ) ⊕ 1 ≼R(τ l+1 ) =R(τ l )⊕1 since R i,s i +1 (τ l ) + 1 ≤ min{N,R i,s i +1 (τ l ) + 1}. Therefore the chain R(t) is always bounded byR(t).