An Algorithmic Approach to Modelling the Co-Evolution of Parasites and Their Hosts

This paper is a reformulation of the paper, Mode 1958 Evolution 12:158 165, which was written in terms of a deterministic paradigm, using differential equations In this paper, however, the working paradigm will be stochastic, and from the mathematical point of view, it will be a stochastic process that may be viewed as a branching process within a branching process. In particular, it will be assumed that the population of host plants will evolve as a multitype branching process, and the pathogen, which grows on the leaves of the host in every generation of the host, will also be assumed to evolve as a multitype branching processes during each generation of the host. The contents of this paper, were motivated by problems in Agriculture in which Plant Pathologists and Plant Breeders work together to control the damage inflicted by a pathogen on a growing crop of a cultivar such as flax, wheat. and many other cultivars. The focus of attention in this paper is the development of algorithms that will guide the development of software to run Monte Carol simulation experiments taking into account mutations in the host and pathogen. The writing of software to implement the algorithms developed in this paper would require a major effort, and is, therefore, beyond the scope of this paper.


A Stochastic Model Describing the Coevolution of an Obligate Parasite and Its Host
The model formulated in Mode (1958) was based on a system of differential equations that were used as a framework to describe the coevolution of an obligate parasite and its host. These equations were also used to suggest that the host and the parasite populations would coevolve to a state of equilibrium in which both the host and the pathogen coexist. During the 1950s, computing technology was very primitive when compared to technology that is available in the present era. But, nevertheless, some simple calculations were included in the (1958) paper to suggest that a host -pathogen system would indeed converge to a state of a mutual equilibrium in which the host and pathogen populations would coexist indefinitely. Two other papers on the dynamics of Host -Pathogen systems were also published. In Mode 1961, a deterministic formulation, which generalized the model in Mode 1958 was published. A stochastic model of a Host -Pathogen system was published in 1964. In this paper, the working paradigm belonged to a class of stochastic processes known as birth and death processes.
In this paper, however, the model that will be formulated to describe the coevolution of a host -parasite system will belong to a class of stochastic processes called branching processes. To illustrate the ideas, in spring of some year in the evolution of a host -pathogen system, let the symbols H i for i = 1, 2, · · ·, n, denote the number n > 0 of host plants in the initial generation. For the sake of simplicity, it will be assumed that all host plants will be infected with the parasite or some other pathogen. It will also be assumed that the infection of a plant by the pathogen, starts with one spore which leads to a pustule that produces a spore, this spore in turn produces one or more spores and that this process continues until a leaf or leaves of a plant is fully covered by the pustules of the parasite. In the literature on stochastic processes, the type of multiplicative process of spores just described in known as a branching process.
Mathematically, a simple branching process may be described as follows. Let the random variable X 0 = 1 denote the initial pustule on a leaf , and let the random variable ξ, which takes values in the set I = {0, 1, 2, · · ·} of nonnegative integers, denote the number of pustules produce by any initial pustule. It will be assumed that the random variable ξ has a Poison distribution with positive parameter λ and probability density function as a branching process. For the sake of simplicity, it will be assumed that an infection of any plant begins with one spore that which produces one pustule, Then, by assumption, the number of pustules produced by the first pustule is a random variable and is given by the equation where ξ is a realization of the Poisson random variable. In general, let the random variable X n denote the number of pustules on a plant in the n-th generation of the pathogen, and let (ξ k | k = 1, 2, · · ·) denote a sequence of independent random variables whose common distribution is that of the random variable ξ. Then, it follows that the number of pustules on a plant in generation 2 is And in general, this line of reasoning may be extended to conclude that is the number of pustules on a plant for all generations ν = 1, 2, ··· of the pathogen. For the case of flax rust, the generation time of the pathogen is about 10 days. In reality and in any computer simulation experiment, only some finite number m > 0 of generations of the pathogen will be considered.
From this equation, it can be seen that the conditional expectation of X ν , given X ν−1 , is for ν ≥ 1. Therefore, the unconditional expectation of the number of pustules in generation ν ≥ 1 on any plant satisfies the recursive equation for ν ≥ 1. The algorithms just described may be used to program a computer to simulate that number of pustules on any of the n plants under consideration at the end of a season. In any computer experiment, it would be necessary to repeat the infection process just described for each of the n plants under consideration.
For any host plant H i , let the random variable Y i denote the total number of pustules on the plant at the end of the growing season. Then, the total load of pustules on the n > 0 plants under consideration is given by the equation The value of the random variable will influence that quantity and quality of the seeds produced by the plants. In extreme cases, the host plants may not be able to produce a sufficient number of viable seeds to produce the next generation of plants. Under such circumstances, the host-pathogen system would go extinct. But, even though the plants are infected with a pathogen, it may also happen that the plants produce a sufficient number of viable seeds to produce the next generation of plants. In such cases, the host -pathogen system would survive for another generation.
Given the a value of the random variable T, let S (ν) denote the conditional probability that in any generation a plant in generation ν ≥ 1 survives to produce plants in generation ν + 1. If a plant is not infected by a pathogen, then it will be assumed that this conditional survival probability has the form If, however, a plant is infected with a pathogen, then it will be assumed that this survival probability has the form (1.9) Let n denote the number of plants at the end of a season. Then, it will be assumed that the number of plants that will survive and produce seeds in the next generation is a random variable W with a binomial distribution with sample size n and probability p. It is well known that the probability density function of the binomial distribution is International Journal of Statistics and Probability Vol. 9, No. 2;2020 for x = 0, 1, 2, · · ·, n.
In what follows, the notation X ∼ BN (n, p) will be used to denote that a random variable X has a binomial distribution sample size n and probability p. In any computer simulation experiment, a realization of the random variable W would be computed using the formula W ∼ BN (n, S (v | T )) , (1.11) and the realization of the random variable W would be the initial number of plants in the next generation. At this point in a simulation, the algorithms outlined above would be used to continue the Monte Carlo simulation experiment for another season.
As indicated in the title of this section, some obligate parasite was considered as the pathogen in the formulation above, but the model would could also be used for cases in which the pathogen may also grow in the soil or on media prepared in a laboratory.

Genetics of Host and Pathogen
The genetics of pathogenicity and host resistance to flax rust, Melampsora lini (Pers) Lev., has been reported in a pioneering works by Flor (1955 and1956). Through a series of genetic studies, Flor has shown that the host and parasite possess complementary genetic systems. That is to say, any gene in the host for resistance acts if, and only if, there is a corresponding gene in the pathogen for avirulence. The genes for host resistance exist as a series of multiple alleles at five loci, designated as the K, L, M, N, and P in the genome of the host. There is one gene at the K locus, 11 at the L, six at the M, three at the N, and four at the P, for a total of twenty genes. The K, L, and M loci are inherited independently, but the N and P loci are linked with about 26 percent recombination. Unlike the genes for host resistance, the genes for pathogenicity in the pathogen exist at twenty-five separate loci.
Resistance of a host genotype to a particular pathogen genotype occurs whenever any allele in the host (at any one of the five multiple allelic loci) and its complementary gene for avirulence in the pathogen at any one of the twenty-five ( diallelic loci) are present simultaneously. The alleles for host resistance are all dominant or semi-dominant so that the heterozygote as well as the homozygote are resistant. The genes for avirulence in the pathogen are also dominant, with the exception of one locus where the homozygous recessive in combination with one or two doses of a gene in the host is necessary for resistance. Further details on the genetics flax resistance to flax rust may be found in Flor (1955) and(1956).
These complementary genetic systems of the host and parasite are illustrated in the model presented below for the simple case of one locus in both the host and parasite with two alleles at each locus. Let R and r, respectively, denote alleles for resistance and susceptibility to the pathogen in the host, and let A and a denote alleles for avirulence and virulence, respectively, in the pathogen. In this model, there are three genotypes of the host, namely RR, Rr and rr. With respect to the pathogen there are also three genotypes: AA, Aa and aa. The interactions of the three genotypes of the host with three genotypes of the pathogen are illustrated in the table below.  Table 2.1, the symbol RIS denotes resistance of the host to the pathogen, and the symbol S US denotes susceptibility of the host to the pathogen. By inspecting the first row of this table, it can be seen that according to the model under consideration, the allele R in the host for resistance to the pathogen is assumed to be dominant to the recessive allele r for host susceptibility to the pathogen. The dominance effect of the allele A is also illustrated in row 2 of the table. In row 3 of the table, the susceptibility of each of the three genotypes of the host to the virulent genotype aa of the pathogen is indicated.
The simple illustrative case illustrating the gene to gene relationship in the host and pathogen considered in this section could in principle be extended to the case of one locus with multiple alleles. But, such an extension would require a more complex notation, and, therefore, will not be considered in this section. In the next section, however, cases in which the gene for gene relationship in the host and the pathogen will be considered with respect to two or more loci with multiple alleles in the host and pathogen.
International Journal of Statistics and Probability Vol. 9, No. 2;2020 Research on the genetics of host -pathogen systems is still an active field of research, but the focus of attention of current research differs from that of Flor. Currently, quite a number of researchers are studying the genetics of host -pathogen systems by sequencing the genomes of the host and parasite. Among those researchers, who are sequencing the gnomes of host and pathogens are Dr. Robert Brueggeman, Department of Plant Pathology, North Dakota State University. His current focus of attention is the sequencing the genomes of barley and one of its parasites, which is a fungus called net blotch. Two Ph.D. theses Boyle (2009) and Richards (2016) have also been devoted to research on net blotch.
Teams of researchers are working on sequencing the genomes of barley and net blotch. For example, the paper Wyatt et al. 2017 contains an account of the sequencing of the Pyrenophora teres f. teres Isolate 0-1, which the scientific name for net blotch. The paper Mascher et al. 2017 is devoted to an account of the sequencing of the barley genome and the paper by Tamang et al. 2019 is devoted to finding susceptibility/resistance to net blotch in the barley genome. In the paper Richards et al. 2016, there is an account of the fine mapping of the barley genome 6H net blotch susceptibility locus G3.. This brief review will provide the reader with some of the areas of genomic research that is currently in progress among scientists working on resistance of barley to net blotch, other plant species and pathogens.

Linkage of Genes in the Host for Resistance to Two Pathogens
It has been observed by plant geneticists during the past several decades that genes in the host for resistances to two or more pathogens are often linked, i.e. they are on the same chromosome in the host. An example of such linkage was reported in the paper Schaller and Briggs (1955) on genes for bunt resistance in wheat. Currently, one can find many reports on the internet on the location of genes for resistance in partially sequenced genomes of barley and wheat and their functions at the molecular level, but it is beyond the scope to this paper to review this more recent literature. The main thrust of this section is to focus on the formulation of a mathematical structure that accommodates linkage of genes for resistance in the host to two or more pathogens that may be used in computer simulation experiments.
In chapter 2 of the book Mode and Sleeman (2012), a mathematical structure for dealing with linkage of genes at two or more loci is described for diploid organisms. In that chapter a diploid genotype was represented by the symbol (x, y), where x is the allele contributed by the maternal parent and y is the allele contributed by the paternal parent. For the case of two alleles denoted by 0 and 1, four genotypes may be identified; namely, (0, 0) , (0, 1) .
It should be noted that not all cereal species that are important in agriculture are diploids. A plant species is said to be a polyploid, if during its evolution two or more genomes have been incorporated to make a single genome. For example, durum wheat and bread wheat are polyploids. Durum wheat (Triticum durum) for example, has 14 pairs of chromosomes. It is thought that during its evolution, two genomes that were originally 7 pairs of chromosomes, were joined to make a species of 14 pairs of chromosomes. Bread wheat (Triticum aestivum).has 21 pairs of chromosomes, and it is thought that the genome of this species is made of the genomes of three species with genomes of 7 pairs of chromosomes. Polyploidal species may not follow the same laws of inheritance than those of diploid species. Consequently, the linkage structure under consideration may not be applicable for polyploid species. Barley is an important spices in agriculture, and it has a genome made up of 7 pairs of chromosomes. Moreover, this species follows the laws of inheritance for diploid species, and therefore the linkage structure that will be developed will be applicable to experiments with barley.
In chapter 2, section 2 of Mode and Sleeman (2012), formulas are derived that accommodate linkage at two or more loci that can be applied in computer simulation experiments involving numerical calculations. In this section, a revision of some formulas will be derived for the case of two autosomal linked loci. In this case, an arbitrary genotype of diploid individual with respect to two linked loci may be represented in the form where 00 represents the alleles at two loci contributed by the females parent. Similarly, the symbol 11 denoted the alleles contributed by the male parent.
An individual of this genotype may in turn produce four types of gametes; namely 00, 01, 10, and 11. Gametes 00 and 11 represent copies of the maternal and paternal gametes respectively; while 01 and 10 are recombination type gametes containing both maternal and paternal genes. In particular cases, the generic gametic symbols could be identified with particular haplotypes when the focus of attention is at the molecular level, or with phenotypes when the discussion is at the Mendelian level. It should also be noted that the Boolean notation under consideration is not phase dependent with respect to some specific genes at two loci to which attention is being directed, because in terms of Boolean indicators only parental origins of genes are under consideration.
The four types of gametes will be produced by a given genotype with certain probabilities depending on the probability of recombination. Let γ(00), γ(01),γ(10), and γ(11) denote the probabilities an arbitrary genotype produces gametes 00, 01, 10, and 11, respectively. In the two loci case, the linkage (gametic) distribution is the set (γ(00), γ(01), γ(10), γ(11)) (3.2) of non-negative numbers whose sum is one. Because of the complementary nature of the meiotic mechanism, i.e., gametes are almost always produced in pairs, the mechanism of meiosis will be called balanced if the equations are satisfied.
If we let ρ be the probability of recombination, then it follows that and But, if the mechanism of meiosis is balanced, then When a objective of an investigation is to map the locations of two loci on the same chromosome expressed in terms of centimorgens (cM), then the pertinent values of ρ will satisfy the condition 0 ≤ ρ ≤ 1 2 . The case of random assortment occurs when ρ = 1 2 so that the probability of each type of gamete is 1/4. In principle, however, ρ may be any value such that 0 ≤ ρ ≤ 1. Note if 0.5 < ρ ≤ 1, then γ (01) and γ (10) are greater than γ (00) and γ (11).
A problem of considerable theoretical importance in assessing the effects of linkage in populations is that of defining recombination probabilities and finding a relation between these recombination probabilities and the gametic distribution, when the number of loci under consideration is greater than two. An interesting step towards a solution of this problem was made by Schnell (1961), who observed that if one makes the transformation then a certain orthogonality is introduced which leads to an extension to cases of an arbitrary number of linked loci.
By using these equations, it can be seen that These equations can perhaps be most easily comprehended in vector-matrix notation. Let γ T = (γ(00), γ(10)) be 1 × 2 vectors, where the superscript T stands for transpose of a vector or matrix, and let International Journal of Statistics and Probability Vol. 9, No. 2; 2020 denote a 2 × 2 matrix. Observe that the rows and columns of this matrix is orthogonal. Then, in vector-matrix notation the above equation may be written in the form This equation is of interest, because it expresses γ as a function of λ. In a computer simulation experiment, if the components of vector λ are assigned numerical values, then the vector γ is determined. From now on the symbol T will be used to denote the transpose of a matrix.
The following observations are very helpful in finding an extension to the case of an arbitrary number of linked loci. Firstly, so that A 2 transpose is A 2 . When a square matrix remains invariant under the operation of transposition, it is said to be symmetric. Secondly, observe that where I 2 is a 2 × 2 identity matrix., which has the form Equation (3.15) expresses an orthogonality condition which, as we shall see, may be generalized to an arbitrary number of loci greater than two. Thirdly, if the symbols 00 and 10 are regarded as the vectors ξ T 1 = (0, 0) and ξ T 2 = (1, 0), then the matrix A 2 may be represented in the form for i, j = 1, 2. Lastly, we see the column vector λ may be represented in the form This equation could be taken as a definition of the column vector λ. In the next section, these results will be extended to the case of three linked loci.

Linkage of Genes for Resistance in the Host to Three Pathogens
In this section in order to make the paper self contained, a revised version of section 2.5 in Mode and Sleeman 2012 will be presented that will provide a structure to consider genes in the host for resistance to three pathogens. A question that naturally arises is whether the above scheme considered in section 3 can be generalized to any arbitrary number of loci greater than two. As we shall see such a generalization is possible, but the manner in which the scheme may be generalized will not become clear until the three loci case is considered. In the three loci case, an arbitrary diploid genotype may be represented in the form 000 111 , (4.1) where as before the zeros and ones represent genes contributed by maternal and paternal parents, respectively. This genotype is capable of generating 2 3 = 8 types of gametes containing various combinations of maternal and paternal genes.
The set of these eight types of gametes will be represented in the form G = (000, 100, 010, 110, 111, 011, 101, 001) , (4.2) and let the vector denote the linkage distribution, i.e., the set of non-negative numbers, whose sum is one, giving the probability that each type of gamete is produced by the meiotic process. Because the meiotic mechanism is assumed to be balanced, gametes are produced in pairs so that following symmetry or complementary conditions will be assumed. From these symmetry conditions, it follows that it will be sufficient to consider only four probabilities from the linkage distribution in setting up a correspondence with a set of recombination probabilities.
( 4.5) Observe that only four gametic probabilities appear on the right so it may be possible to solve four simultaneous linear equations. Also observe that the symmetry conditions were used in choosing the four gametic probabilities on the right.
By substituting the λ s for the ρ s in these equations, it can be shown that By solving these four equations for γ(000), γ(100), γ(010) and γ (110), it can be seen that Just as in the two loci case, these equations may be most easily comprehended if they are cast in vector-matrix notation. Let γ T = (γ(000), γ(100), γ(010), γ(110)) (4.8) and λ T = (1, λ 13 , λ 23 , λ 12 ) (4.9) denote row vectors, and let denote a 4 × 4 matrix. Then the equations may be written in the compact form γ = 1 2 3 A 3 λ . (4.11) It should also be noted that the matrix A 2 is related to the matrix A 3 by the simple recursion formula (4.12) International Journal of Statistics and Probability Vol. 9, No. 2;2020 From this recursive equation, it can be seen that because A 2 is symmetric. Furthermore, from this equation, it follows that = 2 2I 2 0 0 2I 2 (4.14) = 2 2 I 4 where 0 is a 2 × 2 zero matrix and I 4 is an identity matrix of order 4.
We thus see that orthogonality condition carries over to the three loci case. Moreover, let ξ T 1 = (0, 0, 0), ξ T 2 = (1, 0, 0), ξ T 3 = ((0, 1, 0) and ξ T 4 = (1, 1, 0) denote row vectors. Then, by inspection, it can be seen that for all i, j = 1, 2, 3, 4. Just as in the case of two loci, it also follows, by using the above results and solving for the vector λ, that the equation expresses the vector λ as a linear function of the vector γ, given the matrix A 3 .
At this juncture it is important to note that the ordering of the gametic symbols (000, 100, 010, 110) (4.17) played a basic role in extending the observations made in the case two loci to the case of three loci. Furthermore, the linear relation (4.16) , connecting the vectors λ and γ , suggests that to extend the results for the case of three loci to cases of four or more loci, it would be prudent to simplify the notation by abandoning subscripts on the λ and ρ parameters and replacing them by a function notation of the form λ (ξ) and ρ (ξ) , where ξ is an arbitrary gametic symbol containing 0 s and 1 s.
Given this function notation, the matrix A 3 and the ordering of the elements of the gametic vector γ, equation (4.2) leads to an automatic ordering of the elements of the vector λ. This observation suggests that extensions of equation (4.1) to cases of four or more loci could be used to define the vector λ for an arbitrary number of loci. Then, given any ξ ∈ G, a corresponding recombination probability may be determined by the equation By way of an illustrative example, suppose ξ = 000, indicating that in gametes of this type there was no genetic recombination. For the case of three loci under consideration, it was observed that λ (ξ) = 1, which implies ρ (ξ) = 0, indicating with gametes of type ξ = 000 recombination occurs with probability 0, which is consistent with our intuition. In this paper, however, there will be no attempt to extend the results presented in this section to the cases of four or more loci, because it seems likely that demand for such cases will be very limited. If, however, a reader is interested in considering four or more linked loci, chapter 2 of the book Mode and Sleeman 2012 may be consulted for an account of working with four or more linked loci.
It would be of interest to present an example of data such that the probabilities of recombination for the case of three linked loci could be estimated, but no attempt to construct such an example will be undertaken. For readers who are interested in three point test for linkage, the book by Liu 1998 may be consulted.

Mutations in the Host and Pathogen
It is well known that mutations do occur in plant pathogens. For example, a variety of durum or bread wheat may have been grown extensively in an area for many years, because it's resistant to stem rust, a plant pathogen. However, it may happen that there is a mutation in the pathogen, stem rust, so that a variety of wheat that was resistant to the pathogen is no longer resistant, due to a mutation in the pathogen. As the science of genomics continues to develop, it may become possible to identity the location in the genome of the pathogen where the mutation occurred and characterize it at the molecular level. In this section, however, mutations in the host and pathogen will be described and analyzed mathematically with a view towards providing a framework for studying the interactions of hosts and pathogens as well as their coevolution.
International Journal of Statistics and Probability Vol. 9, No. 2;2020 Let A denote a gene in the pathogen for avirulence, and let a denote an allele for virulence. Similarly, let R let denote a gene in the host for resistance to the pathogen. and r denote an allele for susceptibility in the host . It will be assumed that the interactions of the host and pathogen follow those described in Table 2.1 in section 2. Let µ 12 denote the probability per generation that gene A in the pathogen mutates to gene a for virulence. Similarly, let ν 12 denote the probability per generation that gene R for resistance in the host mutates to gene r, which confers susceptibility of the host to the pathogen. If gene a in the pathogen mutates to gene A for avirulence, then, by definition, a back mutation has occurred.
The set of possible mutations in the pathogen may be represented in a two by two matrix P = ν 11 ν 12 ν 21 ν 22 . (5.1) In the first row of this matrix ν 11 = 1 − ν 12 so that if ν 12 = 10 −9 , then ν 11 , which is the probability a mutation does not occur, is close to 1. Row 2 of this matrix, takes into account back mutations, which may occur with probability ν 21 > 0. In computer simulation experiments, the row of the matrix P are assigned numbers such that the rows of this matrix sum to 1. Similarly, mutations in the host may be represented by the matrix H = µ 11 µ 12 µ 21 µ 22 , (5.2) row sum to 1. Observe that ν 12 is the probability per generation that gene R in the host mutates to gene r. The probabilities in row 2 of the matrix H take into account that back mutations may occur in the host.
It seems likely that before wheats, barley, flax and other plants that were domesticated, they were outbreeders so that in an evolving population random mating was prevalent, because winds and perhaps insects would distribute pollen randomly among plants. According to this view, as these species of plants were domesticated they evolved to a state in which they reproduced by self fertilization. In this situation, the anatomy of a flowering plant evolved in such a way that the pollen fertilizes only its eggs in same flower. Consequently, when formulating models describing the evolution of these species of plants, it will be necessary to take into account not only random mating but also reproduction by self fertilization. in models dealing with the joint evolution of the host and pathogen. It will be assumed that the pathogen reproduces asexually. That is individuals of type A produce only individuals of type A if a mutation does not occur. Similar remarks hold for individuals of type a of the pathogen.
A computer experiment that would mimic that rise of virulent mutations in the pathogens and their impact on populations of domestic wheats, barley and other crops would be of practical interest, because they are actually observed in such populations. Bread and durum wheats are polyploids, but with respect to one locus in which genes for resistance or susceptibility to a pathogen are located, the genetic laws governing these traits are similar to those of diploids. The laws governing the resistance to pathogens in barley follow those of a diploid population. Consider a host pathogen system in which pathogen is haploid and reproduces asexually. It will be assumed that the pathogen has two alleles A for avirulence and a for virulence. As is section 2, let R denote a allele in the host for resistance to the pathogen and let r denote at the same locus for susceptibility to the pathogen. In this model, the host population would consist of three genotypes RR. Rr and rr. In the absence of mutation, when a population reproduces either by random mating or selfing, individuals of genotype RR would produce offspring only of genotype RR. But, is such population individuals of genotype Rr would produce offspring with genotypes RR, Rr and rr with corresponding probabilities 0.25, 0.50, and 0.25. Similarly, in the absence of mutation, individuals of genotype rr would produce offspring only of genotype rr.
Mutations are rare events and thus need to be taken into account in a model according the laws of evolution of a stochastic processes. For example, with regard to the pathogen population, let M denote the number of individuals of genotype A in the population in some generation , and let m denote the number of mutations from genotype A to the virulent genotype a. If these mutations occur independently, then it follows that the random variable m has a binomial distribution with the probability density function for x = 0, 1, 2, · · ·, M. When M is large, then the probability density function in 5.3 may be approximated by a Poisson distribution with parameter Mµ 12 .
As a first step in formulating a model in a host population with respect to one locus, consider a population of plants that are homozygous for a gene R for resistance to some pathogen. In this case, all plants in a population would be of genotype RR. It will be assumed that the population reproduces by selfing and that mutations may occur. Specifically, it will be assumed that allele R may mutate to allele r with probability ν 12 per generation, and that allele r may back mutate to allele R with probability ν 21 per generation, see 5.2. Given a population of individuals of genotype RR, it will be necessary to describe the set of events that may occur when mutation occurs in the evolution of a population. In particular, under the assumption that mutations may occur, in what follows formulas will be derived for the distribution of the three genotypes RR, Rr and rr among the offspring of parents with genotypes RR, Rr and rr.
Observe that under the assumption that copies of allele R mutate independently, it follows that is the probability per generation that neither of the copies of the allele R in the genotype RR mutate to allele r. Suppose that among the offspring of a plant of genotype RR an offspring of genotype Rr is found, indicating that a mutation of the form R −→ r has occurred, This mutation can occur in two ways. Suppose the left allele in the genotype RR does not mutate with probability (1 − µ 12 ), but the right allele does mutate with probability µ 12 . In this case, the probability of the occurrence of the mutant genotype Rr is (1 − µ 12 ) µ 12 . It can be seen by the same line of reasoning, that the probability that that left allele in the genotype RR mutates but the right allele dose not mutate is µ 12 (1 − µ 12 ). The two mutational events just described are disjoint. Therefore, the probability of finding an offspring of genotype Rr among the offspring of an individual of genotype RR is (1 − µ 12 ) µ 12 + µ 12 (1 − µ 12 ) = 2µ 12 (1 − µ 12 ) . (5.5) The genotype rr may also be occur among the offspring of individuals of genotype RR. In this case, both the left and right alleles in the genotype RR have mutated to the allele r. The probability of this rare event is The next step in formulating the model of mutation under consideration is to derive formulas for the probabilities of finding mutant genotypes among the offspring of individuals of genotype Rr. The probability that neither of the alleles in the genotype mutate is The probability that the right allele r in the genotype Rr back mutates to allele R is ν 21 . Therefore, the probability of finding an individual of genotype RR among the offspring of an individual of genotype Rr is (1 − µ 12 ) µ 21 . Note that the argument used to derive this formula is the same as that used in deriving formula 5.5. Finally, the probability of finding an offspring of genotype RR among the offspring of an individual of genotype rr is µ 2 21 . (5.12) The next step towards reaching the goal stated above is to derive a formula the probabilities of finding genotypes RR, Rr and rr among the offspring an individual of the genotype RR. Let T OR RR denote the number of offspring on an individual of genotype RR. From 5.4, 5.5 and 5.6, it can be seen that the total probability of finding non-mutant as well as mutant genotypes among the offspring of an individual of genotype RR is As an aid in deriving formulas for the probabilities of finding mutant offspring RR, Rr and rr among the offspring of genotype RR under the assumption mutation, a random variable X RR will be introduced to indicate the genotype of an offspring of genotype RR. Therefore, the conditional probability of finding an offspring of genotype RR among those of genotype RR is (1 − µ 12 ) 2 (1 − µ 12 ) 2 + 2µ 21 (1 − µ 21 ) + µ 2 12 (5.14) Similarly, Therefore, from 5.8 it follows that ( 5.18) Similarly, from 5.9 it can be seen that µand from 5.10 it can be see that (5.23) The expressions in the formulas derived above will be interpreted as elements in three dimensional 1 × 3 vectors that are defined below. In the next section., it will be shown that these three vectors play an essential role in Monte Carlo simulation experiments designed to study the occurrence of virulent genotypes in the pathogen and genes for susceptibilities in the host that arise the host and pathogen populations by the process of mutation.

Evolution of the Host Population as a Multitype Branching Processes
In this section, the formulation of a mulitype branching process evolving on a time scale of discrete generations will be given along with a description of algorithms to compute a sample of Monte Carlo realizations of such a multitype branching processes. The class of multitype branching processes considered in this section, is an extension of the one type branching process described in section 1. To illustrate the algorithms underlying a Monte Carlo simulation procedures to simulate a sample of realizations of this stochastic process, for the sake of simplicity, attention will be focused on the case of m = 3 types, which will be referred as genotypes. Let G = {1, 2, 3} denote the set of three genotypes. For the case of the host with one locus with two alleles R and r, these three genotypes would be RR, Rr and rr as illustrated in section 2.
The number of offspring will be characterized in terms of random variables N ν for ν ∈ G, taking values in the set I = {n | n = 0, 1, 2, 3, · · ·} of non-negative integers. The probability density functions of these random variables will be denoted by P [N ν ] = g v (n) for n ∈ I (6.1) and ν ∈ G. The expected value of a random variable N ν is and may be interpreted as the average number of offspring contributed to the next generation by each genotype ν ∈ G.
As in the foregoing sections of this paper, the probability density functions for the random variables N ν , ν ∈ G, will be chosen as the simple Poisson densities for n ∈ N and ν ∈ G. It is easy to show that for the density in (6.3), E [N ν ] = λ ν > 0 so that the measure of reproductive success λ v is the parameter for a Poisson density for each genotype. In the experiments reported in this paper, a decision was made to consider only the special case of Poisson distributions for the random variables N ν for ν = 1, 2, 3. It is well known for this simple distribution that the expectation and variance both equal the parameter λ.
A multitype branching process in discrete time may be defined as follows. In generation t, where t = 0, 1, 2, 3, · · ·, let the random function X i (t) , taking values in the set N, denote the number of individuals of genotype i ∈ G in generation t, and let X (t) = (X 1 (t)) , X 2 (t) , X 3 (t) (6.4) denote a vector of these random functions. For t = 0, an experimenter needs to assign initial values in the set N to each of the elements in the vector (6.4) . Let the 1 × 3 vector Y i = (Y i1 , Y i2 , Y i3 ) denote the total number of offspring each genotype produced by genotype i ∈ G in any generation. For each genotype i, let Y i1 , Y i2 , · · · (6.5) denote a sequence of conditionally independent random variables, given X i (t) the number of individuals of genotype i in generation t.Each of the random variables in this sequence has a Poisson distribution with parameter λ i . Then, the number of individuals in the population at time t + 1 of genotype i is Y iν (6.6) for i = 1, 2, 3.
The next step in the formulation of the model is to take into account mutation. To include mutations in the model, it will be necessary to introduce the multinomial distribution. The probability density function of the multinomial distribution for the case of three dimensions is f (z 1 , z 2 , z 3 ) = n! z 1 !z 2 !z 3 ! p s 1 1 p z 2 2 p z 3 3 (6.7) where z 1 + z 2 + z 3 = n (6.8) and p 1 + p 2 + p 3 = 1 (6.9) and p i > 0 for i = 1, 2, 3.
Let p RR denote the 1 × 3 row vector defined in 5.24.Then for an individual of genotype RR,when mutations occur the number of offspring of each of the three genotypes, RR, Rr and rr,is given by the 1 × 3 row vector OFF RR , which is a realization of a multinomial distribution with probability vector p RR and sample size n = X i (t + 1), with genotype i = RR.
For the case of genotype Rr, the number of offspring of each genotype is given by the 1 × 3 vector OFF Rr , which is a realization of a multinomial distribution with probability vector p Rr and sample size n = X i (t + 1) with i = Rr. Lastly, for the case of genotype rr the number of offspring of the three genotypes is given by the 1 × 3 vector OFF rr , which is a realization of a multinomial distribution with a 1 × 3 probability vector p rr and sample size n = X i (t + 1) with i = rr.
An algorithm for simulating a realization of a multinomial random vector may be found in the paper by Mode and Gallop 2008. It will be helpful to arrange the realization of the three 1 × 3 vectors in the form of a 3 × 3 matrix of the form when it is necessary to compute the number of individuals of each genotype when mutations occur in a population. For example, when mutations are taken into account, the number of individuals of genotypes RR is at time t + 1 Similarly, when mutations are taken into account the numbers of individuals in the population of genotypes Rr and rr are given by the sums and X rr (t + 1) = 3 j=1 o f f J3 (6.13)

Embedding a Deterministic Model in a Stochastic Process
Let Z 1 , Z 2 , · · · denote a sequence of random variables taking values in the set of real numbers R = (−∞.∞), and suppose the expectation E Z 2 i is finite for every i = 1, 2, · · ·. It is of interest to find the best estimator of the random variable Z n+1 , given the random variable Z n . Let Z n+1 denote this estimator. Then it is well known that the conditional expectation Thus Z n+1 in 7.1 is the best estimator of Z n+1 in the sense of least squares. In what follows, conditional expectations of the form 7.1 will be used extensively .
The objective of this section is to describe a procedure for embedding a deterministic model in a stochastic process. As a first step in describing the derivation of the embedded model, consider the Galton-Watson process defined by equation 1.4 in section 1 by the equation for ν = 1, 2, · · ·. In this case, for a given ν because for each k it is assumed that the random variable ξ k has a Poison distribution with expectation λ. As indicated above, let X i = X i−1 λ (7.5) for ν = 1, 2, · · ·. http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 9, No. 2;2020 For ν = 1 equation 7.5 takes the form (7.6) where E [X 0 ] = X 0 , the initial size of the population, which is an assigned number. Thus, is known. From equations 7.5 and 7.7, it follows that By using the process just described, it can be shown that for k ≥ 1. By definition, equation 7.9 is the deterministic model embedded in the Galton-Watson process.
For the case of a multitype branching process described in section 6, the basic equation of the process is Y iν (7 .10) for genotypes i = 1, 2 and 3, where given X i (t), Y i1 , Y i2 , · · · is a sequence of conditionally independent Poisson random variables for expectation λ i for each genotype i = 1, 2, 3. For each genotype i,it can be shown that Therefore, just as in the case of a one type branching process, the recursive equation will be used to calculate estimates of the sample functions for each genotype i = 1, 2, 3 and t = 0, 1, 2, · · ·.
To take mutation into account in the embedded deterministic model, it will be necessary the use the modified matrix (7.13) which was defined in section 6 in equation 6.10. The symbol o f f indicated that the elements of the matrix have been calculated using estimates of the sample functions of the process. In section 6, the elements of the matrix in 7.13 were calculated as realizations of multinomial random vectors. But, in the embedded deterministic model the rows of the desired matrix, will be calculated as the mean or expectation of a vector with a multinomial distribution.
Consequently, the modified version of the matrix in 7.13 has the form In this equation, X 1 is actually X 1 (t + 1), but for the sake of simplicity, the symbol (t + 1) was not shown in matrix 7.14. Given this matrix, the estimates of the number of each of the three genotypes at time t + 1 are as follows: X ν o f f ν2 (7.15) X ν o f f ν3

Evolution of the Pathogen Population as a Multitype Branching Process
For the sake of simplicity, it will be assumed that the pathogen is a haploid so that each individual of the pathogen population has only one copy of a gene at each locus. In what follows the genotype of an individual in the pathogen population will be denoted by the symbols A and a, which denote avirulence and virulence respectively. According the mutation matrix in 5.1, gene A of the pathogen mutates to gene a with probability ν 12 per generation, and gene a mutates back to gene A with probability ν 21 per generation.
At time t let the random variable W 1 (t) denote the number of individuals of genotype A in the pathogen population, and let the random variable W 2 (t) denote the number of individuals of genotype a in the pathogen population at time t. Let ξ i ( j) for j = 1, 2, · · · denote a sequence of conditionally independent Poison random variables with parameter λ i , where i = 1 or i = 2 denotes the genotype of an individual in the pathogen population. Then the evolution of the number of individuals of genotype 1 = A in the pathogen populate is governed by the recursive equation for t = 0, 1, 2, · · ·. If W 1 (t) = 0, then W (t + 1) = 0 for all t. Similarly, the evolution of the number of individuals in the pathogen population of genotype 2 = a has the form for t = 0, 1, 2, · · ·. To initialize the recursive procedures in equations 8.1 and 8.2, the number of individuals of each genotype in the initial population, W 1 (0) and W 2 (0), must be specified by an experimenter. For each j the expection of the Poison random variables are E ξ j (1) = λ 1 and E ξ j (2) = λ 2 .Because genotype 2 is a, the gene for virulence, the values of λ 1 and λ 2 chosen by an experimenter must satisfy the inequality λ 1 < λ 2 to take into account that virulent genotypes of the pathogen will have many more offspring per generation that an avirulent genotype.
The next step in the formulation of the pathogen process is to take into account that mutations may occur in populations avirulent individuals that are of genotype A. Mutations in individuals of the virulent genotype a will also be included in the formulation. According to the 2 × 2 matrix of mutation probabilities in 5.1 for the pathogen, the mutation probabilities for genotype A are ν 11 and ν 12 , and those for genotype a are ν 21 and ν 22 . Let OA (t) = (n A1 (t) , n a1 (t)) denote a 1 × 2 vector of the numbers of genotypes A and a that are offspring of genotype A at time t + 1. Then, by assumption, the vector OA (t + 1) has a multinomial distribution with sample size W 1 (t + 1) and probability vector p A = (ν 11 , ν 12 ) at time t + 1 in the evolution of the pathogen process. Similarly, let the 2 × 1 vector Oa (t + 1) = (n A2 (t + 1) , n a2 (t + 1)) denote the number of offspring of genotype a with genotypes A and a. By assumption this vector also has a multinomial distribution with sample size W 2 (t) and probability vector p a = (ν 21 , ν 22 ).
To complete the derivation of the process for the pathogen in any generation, it will be helpful for arrange the two vectors just derived in the 2 × 2 matrix n A1 (t + 1) n a1 (t + 1) n A2 (t + 1) n a2 (t + 1) .
From this matrix it can be seen that, when mutation is taken into account, the number of individuals of genotype A in the population at time t + 1 is n A j (t + 1) , (8.4) and the number of individuals of genotype a in the population at time t + 1 is W 2 (t + 1) = 2 j=1 n a j (t + 1) (8.5) By using procedures similar to those in section 7, a deterministic model may be embedded in the pathogen process under consideration. For example, the recursive deterministic model corresponding to equation 8.1 is W 1 (t + 1) = W 1 (t) λ 1 (8.6) http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 9, No. 2;2020 for t = 0, 1, 2, · · ·. Similarly, the recursive deterministic model corresponding to equation 8.2 is W 2 (t + 1) = W 2 (t) λ 2 . (8.7) To include mutation in the embedded deterministic model, the probabilities in the mutation matrix ν 11 ν 12 ν 21 ν 22 (8.8) will play a fundamental role. The estimated expectation matrix for the numbers of mutations for the embedded deterministic model is W 1 (t + 1) ν 11 W 1 (t + 1) ν 12 W 2 (t + 1) ν 21 W 2 (t + 1) ν 22 . (8.9) Therefore, it follows that the estimated number of individuals of genotype 1 in the pathogen population at time t + 1 is Similarly, the estimated number of individuals of genotype 2 in the pathogen population at time t + 1 is W 2 (t + 1) = 2 k=1 W k (t + 1) ν k2 . (8.11)

Generation Times of Small Grains, Other Cultivars, Their Pathogens and Balanced Polymorphisms
An essential ingredient of an evolutionary model of small grains, such as wheat and barley, is their generation times. By definition, a generation time, is the time from the germination of seeds to the time that the mature plants produce seeds. For those varieties of wheat and barley that are planted in the spring in northern temperate regions of the earth, the generations times of these species are in the range of 90 for 100 days. The generation times for varieties of flax in northern temperate regions also are in the range of 90 to 100 days. When these species are grown in the southern temperate regions of the earth, the generation times of the species under consideration are essentially the same.
The generation times of the pathogens, which vary among pathogens, are usually much shorter than those of the host.
In the computer experiments reported in this paper, it will be assumed that the generation time of a pathogen is 10 days. Under this assumption, for every generation of the host, there are about 10 generations of the pathogen that will be simulated. Mutations in the host and pathogen often occur during the reproduction process in connection with the copying of DNA in both the host and pathogen when cells divide. Because for every generation of the host, there will be 10 generations of the pathogen, it is more likely that during every growing season, there many more mutations in the pathogen than the host.
After the seeds sprout in the hosts under consideration, there is a rapid increase in biomass, in the forms of leaves, stems and structures connected with reproduction, such as heads that contain the seeds in wheat and barley and structures that contain the seeds in flax. To model the process of plant growth after germination would technically be a very difficult. Consequently, no attempt will be made to model plant growth in this paper. But, it will be tacitly assumed that plant grow does occur in computer simulation experiments and provides a medium on which a pathogen grows and damages the plant. The procedures described in section 1 will be used in all computer simulation experiments to quantify the damage the pathogen does to the host in each generation of a computer experiment.
The term, balanced polymorphisms, is used in evolutionary biology. From the mathematical point of view, it refers to the structure that results form the convergence of a Host-Pathogen model to a limit. For the case of a deterministic model the so called balanced polymorphism will be the constant structure that is the limit of the solution of deterministic equations, describing the evolution of a host-pathogen system as t −→ ∞ when it exits. For the case of a stochastic model describing the evolution of a host-parasite population, in some formulations, such as a Markov processes, the model will converge in distribution to what is called a quasi-stationary distribution. If a reader is interested of examples in which a stochastic model converges to a quasi-stationary distribution, the book by Mode and Sleeman (2012) may be consulted. For a so called quasi -stationary distribution, the mean and variance of the distribution remains constant for all t after convergence. In the case of a stochastic model, the quasi-station distribution may be referred to as a balanced polymorphism. The models described in the foregoing sections are Markov processes.