A Bayesian Approach for Asset Allocation

The Black-Litterman model combines investors’ personal views with historical data and gives optimal portfolio weights. In this paper we will introduce the original Black-Litterman model (Section 1), we will modify the model such that it fits in a Bayesian framework by considering the investors’ personal views to be a direct prior on the means of the returns and by including a typical Inverse Wishart prior on the covariance matrix of the returns (Section 2). We will also consider an idea of Leonard & Hsu [1992] for a prior on the logarithm of the covariance matrix (Section 3). Sensitivity analysis for the level of confidence that investors have in their own personal views was performed and performance of the models was assessed on a test data set consisting of returns over the month of January 2018.


Black-Litterman Asset Allocation Model
The Black-Litterman asset allocation model, developed by Fischer Black and Robert Litterman in the early 90's while working for Goldman Sachs, has been widely used for decades and it is presented in more detail in the paper by He & Litterman [2002]. Suppose that m assets in the market are considered. The returns of those assets r = (r 1 , r 2 , . . . , r m ) T follow a multivariate normal distribution with mean µ and covariance matrix Σ. That is Black and Litterman proposed the following CAPM (Capital Asset Pricing Model) prior for the mean of the return: where π is the equilibrium risk parameter, which can be expressed as π = δΣw eq where the parameter δ is the investor's risk aversion parameter, w eq is the vector of m equilibrium weights, and the parameter τ in equation (2) indicates the uncertainty of the CAPM prior.
In addition to the CAPM prior, they also take investor's views into consideration. Suppose that the investor has k views. His or her views can be expressed in the following equation: where the matrix P is a k × m matrix, q is a k × 1 vector, and Ω is a k × k matrix, usually diagonal. Each row in P and q represents a personal view. To illustrate all terms in equation (3), we consider an example with four assets: Apple Inc (AAPL), Amazon.com Inc (AMZN), Alphabet Inc Class C (GOOG), and Microsoft Corporation (MSFT). Let µ = (µ 1 , µ 2 , µ 3 , µ 4 ) T represent the mean returns for AAPL, AMZN, GOOG, and MSFT, respectively. Suppose that the investor believes that AAPL will outperform AMZN by 2% and that GOOG will have returns that amount to 5%. In that case, P = 1 −1 0 0 0 0 1 0 , q = 0.02 0.05 The covariance matrix Ω is usually a diagonal matrix. The diagonal elements represent the uncertainty of each view. A small value reflects a high confidence in the view and vice versa. He & Litterman [2002] reported that, by combining the prior information in equations (2) and (3), the mean return follows a normal distribution µ ∼ N(μ,M −1 ) (4) whereμ = [(τΣ) −1 + P T Ω −1 P] −1 [(τΣ) −1 π + P T Ω −1 q] The combined prior in equation (4) compromises the proposed CAPM prior in (2) and the investor's views in (3). The combined meanμ is a weighted average of the CAPM prior mean π and the mean q of the investor's views, with the weights, (τΣ) −1 and P T ΩP, respectively. The combined prior meanμ is closer to the CAPM prior mean π when the uncertainty parameter τ is small, or equivalently, when we are more certain about the CAPM prior.
The unconditional distribution of r is therefore r ∼ N(μ,Σ) which is obtained by combining equations (1) and (4) and then integrating the vector µ out, whereΣ = Σ +M −1 . Based on the unconditional meanμ and covariance matrixΣ in equation (5), the optimal portfolio can be determined by using the standard mean-variance optimization method to maximize w Tμ − δ 2 w TΣ w with respect to the weights w for an investor with the risk aversion parameter δ. He & Litterman [2002] reported that the optimal portfolio weights w * can be expressed as Black-Litterman suggested to replace the covariance matrix Σ by a matrix estimated from historical data, then treated Σ as a known covariance matrix in their model. The optimal portifolio weights w * can be obtained by plugging in all known parameters, the CAPM prior mean π, the uncertainty parameter τ, personal views parameters P, q, Ω, and the covariance matrix Σ. The model they proposed was a probability model. The optimal portfolio weights were easily obtained by plugging in all parameters. No data were collected, only the covariance matrix was obtained using historical data. In this paper, we will propose instead a statistical approach, indeed, a complete Bayesian statistical approach, which takes into consideration investor's views. We will focus our attention on 2 cases: (1) when historical data is available (2) when historical data is not available

Prior and Posterior Distributions
Let r 1 , r 2 , . . . , r n be n independent returns, where each r i represents the returns of m assets and follows the distribution specified in equation (1), that is, We consider commonly used priors for µ and Σ: and Σ follows an Inverse Wishart distribution with ν degrees of freedom and a location parameter Σ 0 , that is, Smaller values of the degrees of freedom parameter ν imply an increasingly more diffuse distribution and larger values for the degrees of freedom parameter ν yield a more highly concentrated distribution about the location matrix parameter Σ 0 . Please note, the prior for µ in equation (8) is similar to equation (2) used for the Black-Litterman approach, but with the more general covariance matrix ∆ replacing the more restricted τΣ. Suppose that historical data is available, then we can determine the prior parameters in equations (8) and (9) using historical data.
We further suppose that investor's views specified in equation (3) are also available: and we would like to include that as part of our prior information. It is obvious that the two proposed priors on µ from equations (8) and (10) would suggest inconsistent information in practice. We would like to retain the information as much as possible, whether the information is from investor's views or historical data. However, we believe that in practice investor's views are more valuable than the more objective prior information based on historical data because this gives investors more power if they also use other models in order to create their inputs for the views.
Let us take a look at the matrix P in detail. Suppose that the investor has k views. Those k views are expressed in equation (10). Each row in P represents a view about the m assets. Those views can be classified as relative views (the rows that sum up to 0) and absolute views (only one 1 in a row). The k views should be linearly independent. If they are not all linearly independent, then some views would be either redundant or inconsistent. As a result, at most k = m linearly independent views can be expressed. In the case when k = m, the matrix P in equation (10) is invertible and only investor's views will be used and the prior information in equation (8) will be automatically ignored. In the case when the number of views k is less than the number of assets m, we will use all information based on investor's views, and additional information based on historical data as much as possible. We now consider an augmented matrix P * based on the matrix P in equation (10), such that the augmented matrix P * is invertible.
In here, we will present a method in which we can add rows to P such that the resulting square matrix P * is invertible.
The main idea is based on the way in which one would row reduce a matrix to the echelon form. It is well known that a matrix is invertible if and only if its row reduced echelon form is the identity matrix. This gives us the idea of taking our matrix P and adding rows to it in order to make it invertible: • For each column in P that has only 0's, we have to create a new row that will have only one 1 in the respective column and 0's in all the others.
• If a row has more than 1 nonzero entry, for each such entry except the entries in the pivot columns, we have to create a row in which we have a 1.
For example, if we consider the illustrative matrix P used in section 1, the above procedure gives us: The augmented matrix P * consists of two parts, the original personal views matrix P and a newly created (m − k) × m matrix P 2 to make the augmented matrix P * invertible.
We now transform the data through P * : r * i = P * r i for i = 1, 2, . . . , n. Then r * 1 , r * 2 , . . . , r * n are independent and where µ * = P * µ and Σ * = P * ΣP * T . We further consider the priors in equations (8) and (9), but on µ * and Σ * , and Suppose that historical data is available, we can objectively specify those prior parameters in equations (12) and (13) in practice, as follows: we calculate the sample covariance matrix using the whole historical data and that covariance matrix will be used as the location parameter matrix Σ * 0 and the number of historical returns will be used as the degrees of freedom parameter ν * . We then split the historical data into many groups of size n. The sample mean vectors are calculated for each group. Then the average of those sample mean vectors will be used as π * and the sample covariance matrix based on those sample mean vectors will be used as ∆ * . Note that, in equation (12), are partitioned accordingly, π * 1 is a k × 1 vector and ∆ * 11 is a k × k matrix. We now impose the investor's views specified in (10) Pµ ∼ N k (q, Ω) by replacing π * 1 and ∆ * 11 by q and Ω, respectively. Therefore, where We now successfully combined two sources of prior information, q and Ω according to investor's views, and π * 2 , ∆ * 12 , ∆ * 21 and ∆ * 22 , objectively specified according to historical data. The sampling distribution of r * 1 , r * 2 , . . . , r * n in equation (11) gives the following density: Also, we had independent prior distributions on µ * and Σ * in equations (12) and (13), respectively: and The joint posterior density of µ * and Σ * given r * 1 , r * 2 , . . . , r * n is proportional to the product of equations (16), (17), and (18) and can be represented as We rearrange the second exponent, then combine the two quadratic functions of µ * in the last two exponents in equation (19). The joint density becomes Where and We can implement a Gibbs Sampler/Markov Chain Monte Carlo (MCMC) procedure to compute the posterior means for µ * and Σ * . To facilitate the procedure, all conditional posterior distributions, µ * given Σ * and Σ * given µ * are needed. Following equation (20), the conditional posterior distribution of µ * , given Σ * can be represented as That is, conditional on Σ * , the vector µ * follows a normal distribution with posterior meanμ * specified in equation (21) and posterior variance (nΣ * −1 + ∆ * −1 ) −1 . Following equation (19), the conditional posterior distribution of Σ * given µ * is That is, conditional on µ * , the matrix Σ * follows an Inverse Wishart with degree of freedom parameter, ν * = ν + n and location matrix parameter The posterior means can be calculated according to Algorithm 1.

Implementation
The simulations in this section were done a couple of years ago, which is the reason why the data ends in 2017, as we will soon see. At that time, downloading closing prices was easily done using the quantmod package in R. For illustration purposes, 4 stocks were considered: Apple(AAPL), Amazon(AMZN), Alphabet Inc Class C.(GOOG) and Microsoft(MSFT). Closing prices for those 4 stocks from January 2 nd 2015 to May 1 st 2017 were considered and the returns were computed. This data is split into two parts: one representing the current data (the last n returns r 1 , r 2 , . . . , r n , here n = 21) and the rest representing historical data used to determine the prior parameters in the model. We chose n = 21 because we are thinking of modeling the returns that happen within a period of approximately a month and 21 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 9, No. 4;2020 is the average number of trading days in a month. For this example, suppose that the investor believes that AAPL will outperom AMZN by 2% and that GOOG will outperform MSFT by 5%. The investor's views are presented as where the columns in P represent the four stocks AAPL, AMZN, GOOG, MSFT, respectively. The following augmented matrix P * was created according to the procedure suggested in section 2.1.
We split the historical data into groups, where each group contains 21 returns. The mean vectors were calculated for each group. Then the average and the covariance matrix of those mean vectors were used to determine the entries in the vector π * and covariance matrix ∆ * , in addition to the already supplied q and Ω according to investor's views, in equation (15).
A burning period of 10 3 was chosen and the number of iterations for the Gibbs Sampler is 10 4 . After the Gibbs Sampler is completed, one would only have to take the mean of the simulated µ * (t) , call itμ * , and the average of the simulated Σ (t) , call itΣ * . The specific values obtained from the Gibbs Sampler for those examples are: 8.867 × 10 −5 −1.423 × 10 −5 −7.09 × 10 −5 −2.234 × 10 −6 −1.423 × 10 −5 9.588 × 10 −5 1.517 × 10 −5 −1.385 × 10 −5 −7.09 × 10 −5 1.517 × 10 −5 9.562 × 10 −5 2.4 × 10 −5 −2.234 × 10 −6 −1.385 × 10 −5 2.400 × 10 −5 4.488 × 10 −5 However, one has to remember that those were transformed using P * , hence now we would have to transform them back into the original space:μ = P * −1μ * ,Σ = P * −1Σ * P * −T . As shown before, the weights are computed according to w * = 1 δΣ −1μ , where δ = 2.5, as chosen in the original Black-Litterman model. Also there has been extensive research when it comes to choosing δ (please see Janecek [2004]). For trading stocks, a risk aversion coefficient between 2 and 3 is reasonable. More specifically, the weights in this example are: The weights are normalized so that the sum of the absolute values of their entries is 1. Also, please note that a negative weight is possible and it corresponds in finance to short selling. Short selling 1 share of Google is done by borrowing 1 share of Google from the market manager, selling it on the market instantly, but with the promise of buying it back and giving it back to the market manager at some future date.

Sensitivity Analysis
In the analysis presented in this section, we will use the same data-set and investor personal views inputs (P and q). In practice, it is of great interest to see how sensitive the models are to changes in the confidence levels that the investor inputs (Ω). Our intuition about the world says that: • The more confident the investor is in the inputted views, the closer the model should follow them • The less confident the investor is in the inputted views, the closer the model should follow history Our intuition of the world should be reflected in the model assumptions and also in the results. The following remark shows that they are indeed: Remark 1. Since Pµ ∼ N(q, Ω), we have that lim Ω→O 2 Pµ = q a.s. Therefore, as the diagonal entries of Ω get smaller and smaller (diag (Ω) contains standard deviations squared), we expect to get closer and closer to q.
This remark suggests also the way in which we will conduct sensitivity analysis. For both models an exhaustive method was implemented that would compute for each pair of diagonal entries in Ω a posterior meanμ * . This is transformed back to the original space of returns:μ = P * −1μ * . Once this is obtained, the distance |Pμ − q| can be calculated for both models. The following graphs have as 2 of the axes the 2 diagonal entries in Ω and the third axis represents the Euclidean distance ||P P * −1μ * − q|| 2 : As ω 1 (confidence in the first view labeled as o1 in the figure) and ω 2 (confidence in the second view labeled as o2 in the figure) decrease, the distance gets closer and closer to 0. As o1 and o2 increase, the distance seems to plateau to a certain value (the history).

Introduction
A very interesting idea for a different prior on the covariance matrix is presented by Leonard & Hsu [1992] and by Albert et al. [2000]. Again, suppose that r 1 , r 2 , . . . , r n are the n independent returns, where each r i follows a normal distribution specified in equation (7). r i ∼ N m (µ, Σ), for i = 1, 2, . . . , n The likelihood function for µ and Σ is L(µ, Σ|r 1 , r 2 , . . . , r n ) = (2π) − nm 2 |Σ| − n 2 exp Vol. 9, No. 4;2020 Let A = log(Σ) = (a i j ) i, j={1,2,...,m} and S = 1 n n i=1 (r i − µ)(r i − µ) T , the likelihood of µ and A can be represented as Here we define the operator Vec * (.) that stacks in a vector the entries of a matrix parallel to the main diagonal. For example, α = Vec * (A) = a 11 a 22 ... a nn | a 12 a 23 ... a n−1n |...|a 1n T Leonard & Hsu [1992] approximated the likelihood of µ and A in equation (24) using Bellman's iterative solution (please see Bellman [1997]) to the linear Volterra integral equation, and reported the corresponding approximated likelihood function for α = Vec * (A) = Vec * (log(Σ)) and µ: Here, λ = Vec * (log(S)) and the (d × d) symmetric almost surely positive definite matrix Q is the likelihood information matrix of α and is a function of the eigenvalues and normalized eigenvectors of S, where d = 1 2 m(m + 1). In particular, We notice that the approximate likelihood function (25) has a multivariate Normal form with respect to α. Specifically, the approximate likelihood function for α is a d = 1 2 m(m + 1) dimensional multivariate Normal distribution with mean vector equal to λ and covariance matrix equal to Q −1 . This functional form of the approximate likelihood function in equation (25) will be the driving mechanism in the Bayesian analysis for α.

The Model
In the case when the historical information is not available and we do not have substantial prior information about the convariance Σ, we consider a vague prior for α = Vec * (log(Σ)) = (α 1 , α 2 , . . . , α m , α m+1 , . . . , α d ) T , where d = 1 2 m(m + 1) The prior is specified in two stages: • In the first stage, given θ 1 , σ 2 1 , θ 2 , and σ 2 2 , the diagonal elements of A = log(Σ), α 1 , α 2 , . . . , α m which correspond to the variance components of the covariance matrix Σ and each component α i follows an independent normal distribution with a common mean θ 1 and a common variance σ 2 1 . The off-diagonal elements of A = log(Σ), α m+1 , α m+2 , . . . , α q , which correspond to the covariance components of the covariance matrix Σ, each follow an independent normal distribution with a common mean θ 2 and a common variance σ 2 2 and those diagonal elements and off-diagonal elements are independent to each other.
• In the second stage, we assume independent diffuse priors for θ 1 , θ 2 and for σ 2 1 , σ 2 2 , respectively. In addition to our priors for α, we include our investors views for consideration. We summarize our model when the historical information is not available as follows: The parameters θ 1 and θ 2 can be integrated out from the joint density of α and θ given σ 2 1 and σ 2 2 : where Combining the approximated likelihood in equation (25), the prior for the covariance matrix in equation (30) and the investors views in equation (28), we obtain that the approximate joint distribution is: The approximated conditional posterior density of α given σ 2 1 , σ 2 2 and µ is: By completing the square, we can combine the two quadratics in α from the exponential and we obtain the posterior: where α * = (Q+G) −1 Qλ. That is, α is approximately normally distributed with mean α * and covariance matrix (Q+G) −1 , where Q and G are specified in equations (26) and (31), respectively.
6: Compute f (t+1) i j by identifying the coefficients of the entries of the log Σ (t+1) matrix from the equation
If we look at Figure 3, we notice that in this version of the model, the distance converges to 0 very fast as o1 (ω 1 in the model) and o2 (ω 2 in the model) go to 0. Also, we notice that as o1 and o2 get bigger, it converges very fast to a stabilizing distance. This is consistent with our intuition since if we are very confident in our views, the model should put a lot more importance on them, while if we are not confident at all in our views, the model should just take into consideration the history. Indeed, if we use only the history, the unbiased estimator for µ is the sample mean of the returns (r) and therefore the distance becomes |Pr − q| = 0.05388875. We will proceed by looking at the performance over the month of January 2018 of a portfolio obtained using this model trained on the same daily returns between January 2 nd 2014 and December 29 th 2017. For those types of analyses, an initial investment of $100 000 was considered, without any commissions, capital buffers for short selling, etc. Just as we did before, in order to obtain the portfolio, we would estimate using Gibbs Sampling the posterior mean (μ) and the posterior covariance (Σ). The portfolio weights which maximize posterior portfolio mean while minimizing posterior variance (risk) are w = 1 2.5Σ −1μ . Using those weights we compute the profits or looses that we would obtain over the month of January 2018 (daily returns between January 2 nd 2018 and January 30 th 2018) with an initial investment of $100 000. Here, one could use a different investment horizon also.
The same P, q, grid for ω i , burn period, iteration period were used as before. Figure 4 is a 3D plot of the sensitivity of the profits to changes in investor's confidence. We observe a profit that is approximately between $10 000 and $58 000. In order to interpret this curve, we would have to know what actually happened in the month of January 2018 using the views inputted. More specifically, over the month of January 2018, Pr Jan2018 = 0.23996743 0.01366718 . Albeit the 1 st view inputted is a 10 th of what happened in reality (AMZN outperformed AAPL by almost 24% in January 2018), the profits curve still gives a higher importance on this view than on the 2 nd view. Indeed, profits increase drastically as we decrease ω 1 and keep ω 2 constant.
A 24% gain on AAPL in a month is an extreme scenario, let us consider a different stock instead of AMZN. We will replace AMZN with FB (Facebook) and we will keep all the inputs the same as before, except that we will input 3 different values for q. In Figure 5 we will present the results for profits when the investor considers q = 0.02 0.05 (a random guess), in  in reality than when we considered AMZN instead of FB (first view in reality was 24% with AMZN in). This is reflected in Figure 5, where we notice that now the second view has a greater influence on the profits curve than what we have seen in Figure 4.
• If we compare Figures 6 and 7, we notice that they seem to be a reflection of each other with respect to a plane parallel to the "o1 vs o2" plane. This would make sense since the only difference between the two is that in Figure   6 we have q = 0.06212815 0.01366718 (exactly reality) and in Figure 7 we have q = − 0.06212815 0.01366718 (opposite of reality).

Conclusion
We have seen that our model follows our intuition: the more confident the investor is in their views, the closer the model will follow them and the less confident the investor is in their views, the more the model will follow history. Moreover, in the version containing the Leonard-Hsu prior, the profit curve when the investor is lucky and inputs views exactly as they will happen (please see Figure 6) is a mirror image of the one when they input the opposite as what will happen (please see Figure 7).
In our next paper we will introduce another full Bayesian version of Black-Litterman, but this time for large data-sets. The motivation to move to a large number of variables is driven by the fact that the investor might want to use the data for the whole market (S &P500), despite the fact that they might have very few views (for example, 2 as we presented in this paper). Our next version will introduce a Bayesian factor model in order to reduce the dimension. This is done because the matrix Q, defined in equation (26), is of size d × d = m(m+1) 2 × m(m+1) 2 and is randomly generated at each iteration in a Gibbs Sampler. Therefore, if one considers the whole S &P500, the size of this matrix in terms of memory would be of