Dynamic Correlation Multivariate Stochastic Volatility Black-Litterman With Latent Factors

In finance, it is often of interest to study market volatility for portfolios that may consist of a large number of assets using multivariate stochastic volatility models. However, such models, though useful, do not usually incorporate investor views that might be available. In this paper we introduce a novel hierarchical Bayesian methodology of modeling volatility for a large portfolio of assets that incorporates investor’s personal views of the market via the Black-Litterman (BL) model. We extend the scope and use of BL models by using it within a multivariate stochastic volatility model based on latent factors for dimensionality reduction but allows for time varying correlations. Detailed derivations of MCMC algorithm are provided with an illustration with S &P500 asset returns. Moreover, sensitivity analysis for the confidence levels that the investor has in their personal views is also explored. Numerical results show that the proposed method provides flexible interpretation based on the investor’s uncertainty in personal beliefs, and converges to the empirical sample estimate when their confidence level of the market becomes weak.


Introduction
In finance, portfolio optimization is one of the quintessential practices and often the goal is to dynamically adjust the portfolio allocation scheme depending on the evolving market volatility. A majority of the stochastic models for asset allocation either use historical data to account for market volatility using a static model (e.g., using multivariate normal distributions after suitable transformations of the returns) or use dynamic models based on various versions of so-called stochastic volatility models that allow for the market volatility to vary over time. However, neither of these models take into account the investor's opinions or experience for future asset allocation along with the historical data. Almost three decades ago, (Black & Litterman, 1992) introduced a novel idea that allows to incorporate investor's views with inputted degrees of uncertainty into standard portfolio theory based on Markowitz models. One of the primary goals of this paper is to embed the Black-Litterman (BL henceforth) model into a dynamic correlation multivariate stochastic volatility (DCMSV henceforth) model (See e.g., (Ku et al., 2014)).
The Black-Litterman model is a portfolio allocation model developed by Fischer Black and Robert Litterman in the early 1990's while they were working for Goldman Sachs. The article published by (He & Litterman, 2002) provided sufficient mathematical details of the BL model that paved the way for other researchers. Previous research on the Black and Litterman model was centered around various optimization methods (Idzorek, n.d.;Bertsimas et al., 2012). The main idea is to solve the optimality condition for the Markowitz mean variance portfolio allocation based on investors personal beliefs. It may not be clear how to generalize this approach to other constrained asset allocation problems. In the past decades more and more researchers realized that the model could be embedded into a Bayesian framework because, as we will see in the next section, equations (1) and (2) are actually Bayesian priors on the means of the returns. One of the authors has 2 papers using precisely this Bayesian approach to Black-Litterman (please see (Andrei & Hsu, 2020) and (Andrei & Hsu, 2021)). Once this observation is made, there is a myriad of other different approaches: some from purely statistical points of view and others coming from finance. Very few incorporate time, while combining ideas from finance and Bayesian statistics (Avramov & Zhou, 2010;Polson & Tew, 2000). In their papers, the time dynamics are embedded through a factor multivariate regression model.
In Section 2, we provide a more detailed background of the BL model before we introduce our proposed version that attempts to combine the BL model with the DCMSV model. As the BL model is predicated on using investor's opinions it is prudent to consider a full Bayesian hierarchical framework that allows for the use of formal prior distributions on the historical mean values of the returns to express (apriori) views that investors have about future asset performance. In this paper, we propose a novel BL-DCMSV model based on latent factors that incorporates investors views as a prior on the mean of the asset returns. Our full Bayesian approach will also allow the covariance matrix of the returns to be stochastic and evolve over time. Our method has several advantages over the existing method. First, we can directly incorporate the investors preferences and market insights into the model. This is one of the main reasons that popularized the BL model in the finance industry. Second, by using a hierarchical Bayesian framework, we avoid running into issues where the estimate is ill-conditioned for high dimensional applications. This is a well-known problem that arises in estimating large covariance matrices with the sample covariance matrix. Regularization techniques are often used to solve this issue (Ledoit & Wolf, 2004;Wang & Zou, 2010;Zou et al., 2018), which introduces other tuning parameter selection problems. Third, our Bayesian framework naturally incorporates uncertainty of investor's personal beliefs, and converges to the empirical sample estimate when their confidence level of the market becomes weak, as illustrated in the numerical studies from Section 5. This feature allows an easy stress test under different expert input scenarios.
The paper is organized as follows. In Section 3, we present an extension of the DCMSV model based on latent factors that incorporates investors views as a prior on the mean parameter of the returns. In Section 4, we present the details of the Markov Chain Monte Carlo (MCMC) methods that we employ to obtain the estimates of the parameters. Section 5 includes numerical illustrations of our algorithm using daily returns from the whole S &P500 and finally we conclude with some final remarks on the possible alternatives and extensions of our models in Section 6.

Black-Litterman Asset Allocation Model
Consider a portfolio consisting of p assets and the goal is to obtain an asset allocation model based on the return values. Let the return value for the i-th asset be denoted by r i = P 1i −P 0i P 0i , where P 1i denotes the current price of the i-th stock and P 0i the corresponding initial price of that stock. Based on standard practice used in finance, let us assume that the vector of asset returns, denoted by r = (r 1 , r 2 , . . . , r p ) T , follows a multivariate normal distribution: Black and Litterman proposed the following Capital Asset Pricing Model (CAPM henceforth) prior for the mean of the return: In finance, π is called the market equilibrium returns and they can be found using the formula: where the parameter δ is the investor's risk aversion parameter, w eq is the vector of p equilibrium weights, and the parameter τ in equation (2) indicates the uncertainty in the CAPM prior. It has been shown in (Cheng, 2008) that one can derive the above equation using the assumptions of the traditional Capital Asset Pricing Model with the extra assumption represented by equations (1) and (2).
In addition to the CAPM prior, Black and Litterman also take investor's views into consideration. Suppose that the investor has v views. His or her views can be expressed in the equation: where the matrix P is a v × p matrix, q 0 is a v × 1 vector, and Ω 0 is a v × v matrix, usually diagonal. Each row in P and q 0 represent a personal view. By combining equations (2) and (3) it follows that when Ω 0 has very small diagonal elements, one may view µ as having a multivariate normal distribution approximately supported on the plane Pµ ≈ q 0 . To illustrate all terms in equation (3), we consider an example with p = 4 assets: Apple Inc (AAPL), Facebook Inc. (FB), Alphabet Inc Class C (GOOG) and Microsoft Corporation (MSFT). Let µ = (µ 1 , µ 2 , µ 3 , µ 4 ) T represent the vector of mean returns for AAPL, FB, GOOG, and MSFT, respectively. Suppose that the investor has v = 2 views and believes that FB will outperform AAPL by 2% and that GOOG will have returns that amount to 5%. In this case, We remember here that the matrix of views P is of size v × p (please see equation (3)). In the simulations we did for the whole S &P500, the number of stocks actively traded on a daily basis is 488. Therefore, in practice, the matrix P can have 2 rows (corresponding to v = 2 views) and 488 columns (a column for each stock). Just as presented above, the matrix P can have very few entries that are nonzero (here, 0 denotes a vector full of zeros). The covariance matrix Ω 0 is usually a diagonal matrix. The diagonal elements represent the variance of each view. A small value reflects a high confidence in the view and a high value reflects a small confidence in the view.
In (He & Litterman, 2002), the authors discussed the model in detail. They reported that, by combining prior information in equations (2) and (3), the mean of the return follows a normal distribution Please notice that the combined prior meanμ is closer to the CAPM prior mean π when the uncertainty parameter τ is small, or equivalently, when we are more certain about the CAPM prior.
The unconditional (marginal) distribution of r is therefore given by which is obtained by combining equations (1) and (4) and then integrating the vector µ out, whereΣ = Σ +M −1 . Based on the unconditional meanμ and on the covariance matrixΣ in equation (5), the optimal portfolio can be determined by using the standard mean-variance optimization method to maximize w Tμ − δ 2 w TΣ w with respect to the weights w for an investor with the risk aversion parameter δ. (He & Litterman, 2002) reported that the optimal portfolio weights w * can be expressed as The optimal portfolio weights w * can be obtained by plugging in the CAPM prior mean π, the uncertainty parameter τ, personal views parameters P, q 0 , Ω 0 , and the covariance matrix Σ, estimated from historical data. Moreover, from a statistical point of view, the investors views prior (equation (3)) and the CAPM prior (equation (2)) are inconsistent.
Instead, in this paper, we will propose a statistical approach, indeed, a hierarchical Bayesian statistical approach for a large number of stocks. A full Bayesian factor model with stochastic volatility models on each common factor is introduced. The purpose of the factor model is to reduce the dimension of the problem. Otherwise, we would encounter computational and memory allocation problems, as we will see later in the implementation and results sections. The purpose of the stochastic volatilities on the common factors is to model the time varying covariance matrix of the returns. We will keep the investor's views prior since we still want investors to be able to input their personal views, but we will drop the inconsistent CAPM prior, which contained only historical information. In Section 5, we will see in our simulations using daily returns of the whole S &P500 from January 2 nd 2014 to December 29 th 2017 that our Bayesian version still uses historical data, especially when the investor is not confident at all in their views (which works according to our intuition).

Dynamic Correlation Multivariate Stochastic Volatility Black-Litterman With Latent Factors
Let y t for t = 1, 2, . . . , denote the vector of returns at the time t relative to the previous time point t − 1. More specifically, the i-th component y i,t of y t is obtained as (P i,t − P i,t−1 )/P i,t−1 , where P i,t denotes the stock price of the i-th asset at time t. One of the most limiting aspects of the Black-Litterman model, as introduced in the previous section, is that it assumes that y t ∼ N p (µ, Σ), i.e., the covariances among the returns are fixed over time and Σ is typically estimated by sample covariance matrix based on say T time points. However when the number of assets is large (e.g., p = 488 for the stocks in S &P500 that are actively traded) and we use only a year worth of data (T = 250 trading days in a year), then the estimation of the p × p covariance matrix Σ using the sample covariance matrix is poor. We however take a different approach and generalize the equation (1) in the original Black-Litterman model by allowing Σ to vary with time by utilizing the framework presented in (Ku et al., 2014) but introducing an additional mean parameter that incorporates investor's views, represented by equation (3). Moreover, by using a Bayesian inferential framework that allows for (low dimensional) latent factors, we show that the even when p > T , the inference about the asset allocation can be carried out within such a generalized model that not only allows for incorporating the investor's views but also allows for the correlations to vary dynamically with time t.

The DCMSV-BL Model
The main idea used by (Ku et al., 2014) is that even when we consider a large portfolio consisting of p assets, there might only be a few, q << p latent market factors that may be driving the prices of the p assets. As illustrated by (Ku et al., 2014) often q = 2 or 3 suffices even when p ≥ 50 in many practical scenarios. Of course, this is a strong assumption and it is generally not known how many q latent factors are necessary to adequately capture the cross correlations among the assets, but the model allows for trying different values and then a predictive criteria is used to determine the number of latent factors. For simplicity, we describe the model for a fixed q << p latent factors denoted by f t , a q × 1 vector and with a p × q loading matrix B, the proposed model can be expressed as follows: iid.
On this mean µ, we introduce the Black-Litterman prior represented previously by equation (3): Now, we introduce stochastic volatility models as priors on the common factors f t : The time varying covariance parameters Q t 's are modeled using the so-called inverse Wishart autoregressive models: (i) an Autoregressive Inverse Wishart process on Q t , for all t = {0, 1, ..., T − 1}, with starting point Q 0 = I q : (ii) an AR(1) process on the volatilities h t = h t1 ... h tq T , for all t = {1, 2, ..., T }: The above specifications provide the conditional sampling distributions of the return vectors and the corresponding prior distributions. The joint posterior distribution of the dynamic and fixed parameters is proportional to the product of the probability density functions of the boxed equations (7), (8), (9), (10) and (11). Clearly, the exact posterior distributions of the parameters can not be obtained in closed form and, therefore, we employ the well known Markov Chain Monte Carlo (MCMC) methods for making inference. In the next few subsections, we derive the full conditionals and then derive the full MCMC sampling scheme.

Full Conditional Distribution for f t
In order to obtain the posterior for f t , we collect the terms that depend on it, which are represented by equations (7) and (9): International Journal of Statistics and Probability Vol. 10, No. 2;2021 We combine the 2 exponentials, expand the quadratic term in the first one and we obtain that in the exponentials we have the following term multiplied by − 1 2 : Since we will make use of the following identity repeatedly, which can be proven by expanding the right hand side, let us introduce it here: Lemma 1. For any n-dimensional vectors x, b and for any n × n invertible matrix M, the following holds: Coming back to our equation, the term ( , we obtain: Please remember that this whole term was multiplied by − 1 2 and it was plugged inside an exponential function (exp {}). This is exactly the kernel of the following multivariate normal distribution:

Full Conditional Distribution for B
Again, the only distributions in the likelihood that depend on B are equation (7) and the prior that we introduce on each entry in the matrix: The kernel of the posterior for B only depends on: The problem with combining the two sums is that, as they are written, the first one goes over time, while the other two go over what we can call "space" (i.e. in this case the p variables and the q factors). In order to be able to combine the sums into a single one, we need them to be indexed in the same way.
The first step is to expand the quadratic term from the exponential over the rows (the p entries in y t ) and to show that the following identity holds: Here, the index going over rows is r and B r· denotes the r th row in the matrix B written as a column vector. This identity holds because if we look at one of the terms inside the sum on the left hand side, we obtain that: Hence, the identity in equation (14) holds. Similarly, by starting from the right hand side, we can also show that: For simplicity, we use the following notations: (i) y = y 1 ... y T is now a matrix that has as columns time and as rows the variables.
(ii) F = f 1 ... f T is a matrix that has as columns the common factors.
(iii) B r· is the r th row in the matrix B written as a column vector.
Hence, using the identities represented by equations (14) and (15), we actually showed that: So, we managed to re-write the sum in the first term in the exponential from equation (13). We are left with re-writing the second sum: Now that finally both sums are indexed in the same way, we can re-write the exponential in the posterior for B represented by equation (13): Let us focus on expanding the first quadratic term: The kernel of the posterior for B depends only on the terms that contain B. Hence, the first quadratic term is a constant for the posterior for B: New, we can apply Lemma 1 with x = B r· , M = 1 r y r· − µ r 1 and we obtain that it is actually equal to: This implies that the posterior distribution of the rows in the matrix of loadings B r· is:

Full Conditional Distribution for µ
Similarly to what we did in the previous section, the only distributions in the likelihood that depend on µ are Hence, by a similar argument as before, we focus on the kernel of µ from the likelihood. The posterior for µ is proportional to the following function: Let us simplify the formulas that we will be working with by introducing the following notation y * t = y t − Bf t . With this, let us focus only on the terms in the exponential, without the multiplication factor − 1 2 : Now, we notice that the first sum (which was multiplied by − 1 2 at an exponential) is the likelihood of a sample y * 1 , ..., y * T iid. ∼ N p (µ, Ω). Hence, we can use the following well known identity for this first sum: Lemma 2. The following equality holds, whereȳ * = T t=1 y * t T : Using this identity, we can go back into equation (17) and we obtain that: The first sum does not depend on µ and, just like before, we will focus only on its kernel. After expanding the quadratic terms we obtain: We apply againLemma 1 with x = µ, M = T Ω −1 + P T Ω −1 0 P, b = T Ω −1 y * + P T Ω −1 0 q 0 and we obtain that the posterior for µ is proportional to (again, multiplied by − 1 2 at an exponential): T Ω −1 y * + P T Ω −1 0 q 0 This is the kernel of a normal distribution:

Full Conditional Distribution for Q t
Again, the only distributions that depend on Q t , for t = {1, ..., T − 1} are: Hence, the kernel of the posterior for Q t , if t = {1, 2, ..., T − 1} is: Therefore, by substituting this into the last exponential, by multiplying and dividing exp 1 2 T t Q −1 t t and by remembering that we introduced the notation Σ t = diag(Q t ) − 1 2 Q t diag(Q t ) − 1 2 , we obtain: Remark 1. Please notice that some of the quadratic terms are scalars: (i) since 1 2 T t Q −1 t t ∈ R and by also using the fact that T r(·) is cyclically commutative, we obtain that: and again by using the fact that T r(·) is cyclically commutative, we obtain that: Using those identities in our previous equation, we obtain: That is, the posterior for Q t for t = {1, ..., T − 1} is distributed as an Inverse Wishart times another term. This is exactly what was reported in (Asai & McAleer, 2009). As noticed in the paper and as we will see in here in the implementation T T T T

Full Conditional Distribution for A
The only distributions that depend on A are: Hence, by multiplying all of them and focusing only on the kernel of A, we obtain: Here we used the property that T r(·) is cyclically commutative and additive, just as we did in the previous section when we computed the posterior for Q t . We obtain that the posterior for A is the same as the one reported by (Asai & McAleer, 2009):

Full Conditional Distribution for d
The only distributions that depend on d are: Hence, by multiplying all of them and focusing on the kernel of d, we obtain: By using the fact that T r(·) is cyclically commutative, we know that: When we will write this probability density function (pdf), we will think about the implementation also. It turns out that it is not a pdf of any commonly known distribution, and in order to sample from it we will need an algorithm such as Adaptive Rejection Metropolis-Hastings. There already exists a pre-programmed function in R called arms() which takes as input the log(·) of the pdf. Therefore, by also preparing the pdf for implementation, we obtain that the posterior for d is:

Full Conditional Distribution for k
Similarly to what we did before, the only distributions that depend on k are: By multiplying them, focusing on the kernel of k, and using the cyclical commutativity and the additivity of T r(·), we obtain: The q th −variate gamma distribution is connected to the univariate one by the following identity: Hence, if we apply the above lemma for a = k 2 and we focus only on the kernel of k, we obtain that: Again, this posterior is not from a common distribution and in the implementation, we will use the same Adaptive Rejection Metropolis-Hastings that we used at sampling from the posterior for d. For this, we need the ln() of the probability density function: The distributions that depend on σ 2 i for i = {1, ..., p} are: Since Ω is diagonal, from the first assumption we obtain that: Together with the fact that σ 2 i indep.
∼ InvGamma (α 0 , β 0 ), we obtain that the posterior for σ 2 i is: MCMC Sampler for BL-DCMSV model 8: Use the arms() function in R to sample: : Use the arms() function in R to sample: By running this MCMC Sampler, we obtain samples from the posterior mean of the returns µ ( j+1) . In practice, the investor is interested in a portfolio of stocks, with optimal vector of weights w * such that w T 1 = 1 and it maximizes the mean of the posterior portfolio (i.e. w T µ) while minimizing the risk (or variance w T Σ ( j+1) w). Please notice that the theoretical covariance matrix of the returns Σ under the assumptions of our model represented by equations (7), (8), (9), (10) and (11) is: In the last equation we used the fact that the common factors and errors are independent, which in this case it is equivalent to uncorrelated since they are both multivariate normal. Moreover, using assumption (9) and Tower Property, we know that E [f t ] = 0 and using assumption (7) we also know that E [e t ] = 0. Hence, we obtain that indeed 0 = Cov(f t , e t ) = E f T t e t − 0. Therefore, we find that the covariance matrix of the returns is: Therefore, those ideas tell us that the investor can even obtain a posterior sample of optimal weights by solving the http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 10, No. 2;2021 following quadratic programming problem at each step inside the MCMC Sampler:

Numerical Illustrations
Just as we did in the example from Section 2, we choose the same 4 stocks (AAPL, FB, GOOG, MSFT), with daily returns from January 2 nd 2014 to December 29 th 2017. We will use the following inputs, where the columns are in order AAPL, FB, GOOG, MSFT and the rows represent the views: We ran the MCMC Sampler with 3 parallel chains each for 2000 iterations and carried out the standard convergence diagnostics, as can be seen in the following Gelman, trace and density plots of the posterior mean µ: The MCMC Sampler takes approximately 27 hours to run for desired level of empirical conergence. Clearly, this approach is computationally expensive, therefore, the sensitivity analysis was ran in parallel on 144 cores at once (each core running the MCMC Sampler for 1 pair of confidence levels (ω 1 , ω 2 )).
But how will we conduct sensitivity analysis for the confidence that the investor has in his/her views (the diagonal elements of the Ω 0 matrix denoted by ω i here)? Intuitively, we would expect the model to behave in the following way: (i) The more confident the investor is in the inputted views, the closer the model should follow them (ii) The less confident the investor is in the inputted views, the closer the model should follow history This also suggests the way in which we will conduct sensitivity analysis. An exhaustive method was implemented that would compute for each pair of diagonal entries in Ω 0 (i.e. investor confidence levels) a posterior meanμ. Using this, we can compute the Euclidean distance ||Pμ − q 0 || 2 . Figure 3 has as 2 of the axes the 2 diagonal entries in Ω 0 and the third axis represents the distance ||Pμ − q 0 || 2 . Figure 4 depicts the same distance obtained by using the model from (Andrei & Hsu, 2021) on the same data-set, with the same inputs P and q 0 : Figure 3. ||Pμ − q 0 || 2 for different confidence levels Figure 4. Distances from (Andrei & Hsu, 2021) Both 3D plots confirm the fact that the models work according to our intuition: (i) the smaller o1/o2 from the plots (or ω 1 /ω 2 from the theory), the more confident the investor is in the view since it is actually a variance. We can see in both 3D plots that the smaller o1/o2, the smaller the distance ||Pμ − q 0 || 2 becomes.
(ii) the larger o1/o2 (or ω 1 /ω 2 ), the less confident the investor is in the view since, again, it is actually a variance. We can see in both 3D plots that the larger o1/o2, the larger the distance ||Pμ − q 0 || 2 becomes and it plateaus to a certain value (the history ||Py − q 0 || 2 = 0.0539).
The only difference between the 2 plots is that the distance in the model presented in this paper converges to history slower than the distances from (Andrei & Hsu, 2021). This slower convergence to history is preferred in practice because the investor would like a more gradual mix between the historical data and the personal opinions, as the levels of confidence in those opinions change.
Let us also consider a testing data-set consisting of the month immediately following our historical data, which, in this case, is January 2018 and an initial capital of $100, 000. Over this month, we had the following returns r FB = 3.141879%, r AAPL = −3.070936%, r GOOG = 9.266661% and r MS FT = 7.899943%. Hence, in reality, we had that the first view was r FB − r AAPL = 6.212815% (we inputted 2%) and the second view was r GOOG − r MS FT = 1.366718% (we inputted 5%). For each combination of confidence levels in those 2 views ω 1 and ω 2 , we compute the optimal portfolio weights using the procedure described in equation (18). Using those weights, we can easily compute the profits/losses for each combination of investor confidence levels. The mean across all pairs (ω 1 , ω 2 ) is $105, 294, 405, the minimum is $70, 232, 381 and the maximum is $154, 101, 151. Please keep in mind that those estimates do not include any sort of commissions, capital requirement for short positions (you could even short 500% of a certain stock), etc.
International Journal of Statistics and Probability Vol. 10, No. 2;2021 Similar conclusions as the ones we derived from Figure 3 can be drawn if we plot the prior density of the distance ||Pµ − q 0 || 2 together with the posterior density of the distance ||Pμ − q 0 || 2 for different confidence levels. Sampling from the prior is easily done if we make the following observation: . Density of ||Pμ − q 0 || 2 for 10 −2 confidence in both views As we start from Figure 5 and we move to Figure 11, we sequentially go from the very confident investor to the least confident investor. We notice again that the model indeed works according to our intuition: (i) as the level of confidence that the investor has decreases, the 2 densities become further and further apart (ii) as the level of confidence the investor has decreases, we notice that the standard deviation of the posterior density becomes smaller and smaller. This is because the model starts putting more and more importance on history, as we have seen in the 3D plot represented by Figure 3.

Conclusions
In this paper, we propose a novel hierarchical Bayesian model based on latent factors that incorporates investor's views as a prior on the asset returns, while allowing the covariance structure of the asset returns to be stochastic and dynamic. The advantages of our method are three folds. First, we can directly incorporate the investor's preferences and market insights into the model via the BL model widely used in the financial industry. Second, we avoid ill-conditioned estimates for high dimensional applications by using a hierarchical Bayesian approach. Third, our Bayesian framework naturally incorporates uncertainty of investor's personal beliefs, and converge to the empirical sample estimate when their confidence level of the market becomes weak. This feature facilitates an easy stress test under different expert input scenarios.
In the future, we plan to investigate sensitivity analysis for profits/losses when holding a portfolio optimized using this alternative of BL, including margin requirements for short positions.