Geometric Views of Partial Correlation Coefficient in Regression Analysis

By describing the geometric analogues of the concepts from various perspectives, this work aims to provide a richer and intuitive comprehension of the concept of partial correlation coefficient in the regression analysis, especially for beginning students. Based on a simple and strictly correct geometric framework, this article geometrically illustrates the concept of partial correlation coefficient in regression analysis from the views of the Frisch-Waugh-Lovell Theorem, partial F test statistics, and the comparisons with other levels of correlation coefficients. In our opinion, the geometric approach sheds lights on the regression analysis as it provides a richer and more concrete understanding for readers, especially for beginners. This paper can also be served as a supplementary reading material for serious beginners.


Introduction
Since the subject of regression emerged in the late of 19 th century, algebra has been widely used to express concepts and build up models in regression analysis.Most concepts in regression analysis are traditionally introduced in terms of algebraic equations and matrices.Projection arguments in the form of matrix algebras (Graybill, 1976;Kutner, Nachtsheim & Neter, 2004) are widely used as educational and research tools for high-dimensional modelling in the advanced studies of multiple regression analysis.However, sometimes, it is challenging for non-math majors or beginners to completely understand the matrix algebra approaches as it requires the solid pre-knowledge of linear algebra for readers.In fact, geometric interpretation is in reality more helpful than cumbersome algebraic equations and matrices in understanding regression concepts because its visual presentation is concrete (Margolis, 1979;Bring, 1996).An understanding of the geometrical aspects of elementary regression analysis may assist a student more effectively than elegantly derived formulas (Saville & Wood, 1986).
Despite its merits, geometry is seldom used in teaching regression analysis courses.Besides the long-time predominance of algebra since the 20 th century, the resistance to abstraction is one of the primary reasons.To reduce the beginners' fear of abstraction, an intuitive display of regression model in a visible three-dimensional space is the key to open the door of geometric thinking.A multiple regression problem with two predictors is an ideal motivating example for introducing the geometric thinking of regression modeling to beginners as its layout perfectly fits into a visible space.Furthermore, it serves as an important step stone for understanding high dimensional linear regression models since all these basic results can be easily extended for a more general linear regression mode.By exploring vectors, triangles and projections, and drawing them clearly in such three-dimensional space, students do not have to delve into complicated algebraic calculations nor advanced matrix algebras.Some classical textbooks introduce the simple geometric illustration of basic concepts in regression analysis including the least-square estimation and the simple correlation coefficient (Draper & Smith, 2014).But the geometrical interpretation of other important concepts such as the partial correlation coefficient, partial regression coefficient and partial test statistic are often left out.In fact, students usually experience difficulty in understanding these concepts in studies.The visualized geometric representation will be of great help for understanding the partial correlation coefficient and its related concepts.The views from different perspectives will provide richer understanding for students.Based on a simple and strictly correct  3 , the objective of this paper is to geometrically illustrate the concept of partial correlation coefficient from various perspectives.The rest of the paper is organized as follows.Section 2 first corrects a graphing problem in the previous research by mapping the original n-dimensional space   to a  3 one without the loss of information.Section 3 geometrically interprets the partial correlation coefficients from the idea of the Frisch-Waugh-Lovell Theorem (Frisch & Waugh, 1933;Lovell, 1963), verifies its relationship with other types of correlation coefficients, and reveals its connection with the partial test statistics.Finally, conclusions and a brief discussion are given in section 4. Those who have finished an introductory course of econometrics or statistics would find it especially helpful when they examine these visual analogues sketched out in the paper.

Drawing Graphs on 𝑬 𝟑
Consider a multiple regression model with two independent variables as follows: where  contains  observations of response,  1 and  2 are two independent predictors,  1 ∈  1 and  2 ∈  1 are the regression coefficients,  is the error term, and n is the size of the sample.In geometry, each regression variable is considered as a vector in an n-dimensional space.For example,  ⃗ = ( 1 ,  2 , … ,   )  is the n-dimensional observation vector, two independent predictors  ⃗⃗ 1 and  ⃗⃗ 2 are both n-dimensional column vectors,  ⃗⃗⃗ is also a column vector in n-dimensional Euclidean space,   .For the ease of demonstration, in Figure 1, it is common practice to have an n-dimensional vector displayed in a 3-dimensional vector space (Bring, 1996).In Figure 1,  ⃗ (or ′ ⃗⃗⃗⃗⃗⃗⃗⃗ ), is the orthogonal projection of  ⃗ on the plane spanned by  ⃗⃗ 1 and  ⃗⃗ 2 that denoted by  ( ⃗⃗ 1 ,  ⃗⃗ 2 ).Vector  ⃗⃗⃗⃗⃗⃗ is the orthogonal projection of  ⃗ on  ⃗⃗ 1 , and vector  ⃗⃗⃗⃗⃗⃗ is the perpendicular projection of  ⃗ on  ⃗⃗ 2 .
Figure 1.Geometric Interpretation of Least Squares Method in n-space.
However, the problem with Figure 1 is that it is displaying in a 3-dimensional space which contradicts with the fact that  ⃗,  ⃗⃗ 1 and  ⃗⃗ 2 are vectors in n-dimensional space   .According to Saville and Wood (1991), this demonstration is not strictly correct as in higher dimensions (n>3), vectors cannot be shown pictorially in a strictly correct manner.To solve this contradiction, we establish a transformation matrix  to map the n-dimensional vectors  ⃗,  ⃗⃗ 1 and  ⃗⃗ 2 onto the 3-dimentional space.This linear transformation keeps: a) The length of any vector unchanged; b) The angle between any two vectors unchanged.
After pre-multiplying  with both sides of the equation (1), it leads to the following model where  ,  1 , and  2 are vectors in the 3-dimensional Euclidean space  3 .At the same time, in the original n-dimensional space,  ⃗,  ⃗⃗ 1 and  ⃗⃗ 2 span a 3-dimensional subspace of   .This subspace has the same dimensions as  3 .On the other hand, we know that "any two finite Euclidean spaces are isomorphic if and only if they have the same number of dimensions."Therefore, the subspace is isomorphic to  3 .This isomorphism ensures that the results we obtain from the new  3 are the same as those we get by analyzing the original   .Furthermore, this process ensures the strictness and correctness of drawing any high dimensional vectors, angles and triangles in a  3 .In this process, the regression coefficients   (for  = 1,2) is unchanged.This idea is summarized as: Theorem 1: For any 3 linearly independent vectors  ⃗ ⃗⃗,  ⃗ ⃗⃗  and  ⃗ ⃗⃗  in   , there exists an orthogonal transformation  between ( ⃗ ⃗⃗,  ⃗ ⃗⃗  ,  ⃗ ⃗⃗  ) and   , such that, In Theorem 1, θ is the angle between vector  ⃗ and  ⃗; the angle between vector  ⃗⃗ 1 and  ⃗ is named as  1 , the angle between  ⃗⃗ 1 and  ⃗⃗ 2 is γ; and vector  ⃗ ⃗⃗⃗ 21 is defined as the predictor of  ⃗⃗ 2 when regressing  ⃗⃗ 2 on  ⃗⃗ 1 .See appendix for the detailed proof.In the following sections, we focus on model (2).

The Basic Idea of Correlation Coefficients
A correlation coefficient is a quantitative measurement of the linear association between two variables of interests.In regression analysis, there are three classical correlation coefficients: the simple, partial and multiple correlation coefficients.In geometry, it is commonly known that the behavior of all three types of correlation coefficients can be expressed in the form of the cosine functions.
First of all, the simple correlation coefficient between any two variables can be expressed as the cosine of the angle between the two vectors that represent the variables.For example, the simple correlation coefficient between  and  1 , denoted by   1 , can be depicted as the cosine of  1 , where  1 is the angle between  ⃗ 1 and  ⃗ in Figure 2. It is clear that the magnitude of the correlation coefficient depends on the angle between two variable vectors of interests.The closer one vector is to the other (or its opposite vector), the stronger linear relationship it suggests.Particularly, there is no linear association when two variable vectors are perpendicular to each other.
The multiple correlation coefficient between  and  1 and  2 , denoted by  , 1  2 , is used to measure the goodness-of-fit of for a linear regression model.Similarly, in geometry, it is shown as the cosine of the angle between the response vector  ⃗ and the estimation space that spanned by  ⃗ 1 and  ⃗ 2 .In another word, where the least square estimation vector  ⃗ (or ′ ⃗⃗⃗⃗⃗⃗⃗⃗ ) is achieved by orthogonally projecting the observation vector  ⃗ onto the  1  2 plane, and θ is the angle between vector  ⃗ and  ⃗ in Figure 2. In a general linear regression model with  predictors (where  > 2), the multiple correlation coefficient is the cosine of the angle between the response vector  ⃗ and the estimation space that spanned by all predictors.
On the other hand, the partial correlation coefficient between  and  1 , denoted by   1 • 2 , is defined in such a way that it measures the effect of  1 on  where  2 is not accounted for in the model.Conceptually, it is calculated by eliminating the linear effect of  2 on  as well as the linear effect of  2 on  1 .To purify  and  1 of the linear influence of  2 , we can first regress  1 on  2 and obtain the residual ̂1 (i.e. ⃗⃗⃗⃗⃗⃗ in Figure 2).Next, we regress  on  2 to obtain the second residual ̂2 (i.e. ⃗⃗⃗⃗⃗⃗ in Figure 2).Then given  2 , the partial correlation coefficient between  and  1 , can be obtained as the simple correlation coefficient between ̂2 and ̂1, or equivalently in Figure 2, where  1 is the angle between  ⃗⃗⃗⃗⃗⃗ and  ⃗⃗⃗⃗⃗⃗ .This result leads to the following proposition.Proposition 1: Given that  2 is retained in the model, the partial correlation coefficient between  and  1 is the cosine of the angle between the subspace that spanned by  ⃗ and  ⃗ 2 , and the estimation space that spanned by  ⃗ 1 and  ⃗ 2 .
To calculate the partial correlation between  and  2 , one just needs to simply switch the subscripts of  vectors.If we define the relevant angle as  2 , we have   2 • 1 = cos  2 .

Partial Correlation Coefficient and The Frisch-Waugh-Lovell Theorem
One way to view the partial correlation coefficient is from the perspective of partial regression coefficient, since both terms are used to evaluate the contribution of the interested variable alone given the remaining variables in the regression model.In fact, the idea of Frisch-Waugh-Lovell theorem (Frisch & Waugh, 1933;Lovell, 1963), a well-known econometric theorem that is proposed to estimate the partial regression coefficient, is also helpful for understanding the geometric interpretation in equation ( 3) since its two-step trend removal procedure discloses the true meaning of partial correlation coefficient.As an alternative to the direct application of least squares, Frisch-Waugh-Lovell (FWL) Theorem shows that for the regression model where  1 and  2 are  ×  1 and  ×  2 design matrices, respectively, the estimate of  1 , the  1 × 1 coefficient vector for  1 , will be the same as its estimate from a modified regression model where   2 =  −  2 ( 2   2 ) −1  2  projects onto the orthogonal complement of the column space of  2 .In a multiple regression with two-predictor setting (2), the second residual ̂2 can be obtained by projecting  onto the orthogonal complement of the column space of  2 .That is, ̂2 =   2 .The first residual ̂1 also yields to   2  1 .According to the FWL theorem, the partial regression coefficient of  1 (i.e. ̂1) from regressing  on  1 and  2 simultaneously is simply obtained by regressing ̂2 on ̂1 .Consequently, the partial correlation coefficient   1 • 2 yields the simple correlation coefficient between ̂1 and ̂2 in the modified regression model (5).

Relationship among Simple, Partial, and Multiple Correlation Coefficients
Another way to examine the partial correlation coefficient is by looking into its relationship with other types of correlation coefficients.This section geometrically studies and verifies the relationship among simple, partial and multiple correlation coefficients.Three classical equations are chosen to characterize the relationship among three types of correlation coefficients.These equations have been proved by using algebras and matrices which can be found in basic statistics or econometrics texts.However, in this section, the process of deriving these equations totally relies on simple geometric techniques that are not only easy to grasp but also provide readers with richer understanding of correlation coefficients from a different perspective. ( This result first appears in a basic text by Anderson (1958).Here, the coefficient of determination  2 is simply the square of multiple correlation coefficient.Based on the geometric expressions summarized from the previous section, in Figure 2, it is clear that the left side of (6) yields to Similarly, the right side of (6) yields to where  2 is the angle between  ⃗ 2 and  ⃗.Then ( 6) is concluded. Symmetrically, Equation ( 6) and ( 7) can also be simplified as and respectively.Equation ( 8) states that the proportion of the variation in  explained by  1 and  2 jointly is the sum of two parts: the part explained by  1 alone ( i.e.   1 2 ) and the part not explained by  1 ( i.e. 1 −   1 2 ) times the proportion that is explained by  2 after eliminating the influence of  1 (Gujarati, 1995).
This equation builds up a connection between the simple correlation and multiple correlation.In Figure 2, section 3.1 suggests that . Note that where  1 denotes the angle between vector  ⃗ 1 and  ⃗.Equation ( 10) is therefore clearly obtained.Symmetrically, This is a well-known algebraic formula for developing partial correlation coefficient from simple correlation coefficients.
The spherical triangles method (Thomas & O'quigley, 1993) shows that equation ( 12) is identical to the formula of spherical trigonometry, which is illustrating but advanced for some readers.Alternatively, this section proposes a simple version of its geometric proof.
Proof: According to the equations ( 6) and ( 10), we can conclude that And where  2 is the angle between vector  ⃗ 2 and  ⃗.In Figure 2, Therefore, equation ( 12) is concluded.

Partial Correlation Coefficient and Partial Test Statistic
Similar to the concept of partial correlation coefficient, given the effect of a set of removed controlling variables, the partial F test is used to measure the importance of the interested variable alone from the perspective of hypothesis testing.This section discusses the geometric expression of the partial F test and discovers its geometric connection with partial test statistics.
Assume that we are interested in testing the statistical significance of the first predictor  1 .That is, test  0 :  1 = 0.In algebraic methods, the corresponding test statistic  1 is given by Here,   is the residual sum of squares of the unrestricted model that regresses  on  1 ,  2 , … ,   while   is the residual sum of squares the restricted model that regresses  on  2 , … ,   only, leaving out  1 .Thus, (  −   ) is the extra sum of squares when omitting  1 in the model.In the geometric approach, the number of degrees of freedom, denoted by , is the number of dimensions in which the vector is free to move.Note that the dimension of ( ⃗,  ⃗⃗ 1 ,…,  ⃗⃗  ) is ( − 1), while the dimension of ( ⃗⃗ 1 ,…,  ⃗⃗  ) is .Consequently, the dimension of the subspace in which the residual vector is free to move is ( −  − 1).
Figure 3 illustrates the geometric process of the partial test statistic  1 .In Figure 3, vector  ⃗1 (i.e. ⃗⃗⃗⃗⃗⃗ ) is the estimation vector from projecting the observation vector  ⃗ on the subspace ( ⃗⃗ 2 ,…,  ⃗⃗  ) .Vector  ⃗ (i.e.′ ⃗⃗⃗⃗⃗⃗⃗⃗ ) is the estimation vector from projecting  ⃗ on the subspace ( ⃗⃗ 1 ,…,  ⃗⃗  ).Vector ′ ⃗⃗⃗⃗⃗⃗⃗⃗ and  ⃗⃗⃗⃗⃗⃗ represent the residual vectors from projecting  ⃗ on the subspace ( ⃗⃗ 1 ,…,  ⃗⃗  ) and the subspace ( ⃗⃗ 2 ,…,  ⃗⃗  ) , respectively.Therefore, it is easy to conclude that where  1 is the angle between the residual vector from the restricted model and the vector that obtained by subtracting the residual vector of restricted model from the residual vector of unrestricted model.Equation (20) discloses the geometric interpretation of partial F test statistic.A similar result is concluded in Siniksaran (2005) from a different way.
In the general regression model (18), cos  1 is the partial correlation coefficient between  and  1 , given that  2 , … ,   are retained in the model.Specially, consider model (2), when  = 2, the display in Figure 3 is consistent with that of  , a simple algebraic manipulation of (21) will lead to It is meaningful to study equation ( 22).On one hand, it furthermore discovers the relationship between test statistics and the partial correlation coefficients.On the other hand, it reveals the ideas for establishing the decision rule in the hypothesis testing.It is commonly known that the  0 :  1 = 0 should be rejected under significance level α when  1 2 or  1,−−1 is large enough, say, larger than the critical value  1,(−3), .Based on equation ( 22), a large value of  1 2 or  1,−−1 implies a large value of   1 • 2 which inherently suggests a stronger linear correlation between  and  1 , given that  2 is already retained in the model.Consequently, it is more likely that  1 is a significant addition to the prediction of .

Conclusion
To ensure the strictness of drawing n-dimensional vectors, angles and triangles into a  3 , this article first corrects an existing graphing problem in the current literature.This article also geometrically introduces and interprets the concept of partial correlation coefficient from the perspectives of FWL theorem, simple correlation coefficient, multiple correlation coefficient, and partial F test statistics.Unlike other pedagogical literatures in the regression studies, the geometric analogues of the regression concepts in a  3 do not require the prerequisite of advanced linear algebra nor abstruse abstract thinking.All the geometric proofs proposed in this article are concise and easy to follow.It is clear that the demonstration of the basic regression concepts in a visualized and familiar three-dimensional space facilitates the understanding of basic concepts for students and furthermore it serves as an important step stone for learning complex regression models in their future studies.It is worth to mention that the graphical displays in this paper are also suitable for a general multiple regression model with more than two predictors.And the results can be easily generalized.One just need to hold the vector of interested variable while having the other vector in the estimation space to represent the rest of predictors.Furthermore, for future work, other classical concepts and theorems in the regression analysis can also be visualized and interpreted in terms of a few principles of geometry through the geometric structure in this paper.

Proof of Theorem 1
In Figure 1,  1 is the angle between  ⃗⃗ 1 and  ⃗;  2 is the angle between  ⃗⃗ 2 and  ⃗; the angle between vector  ⃗⃗ 2 and  ⃗ On the other hand, Therefore, we prove that any pairs of row vectors in A are orthogonal vectors.

Figure 2 .
Figure 2. Geometric Interpretation of Partial Correlation Coefficients