Study of the Relationship between Dependent and Independent Variable Groups by Using Canonical Correlation Analysis with Application

Canonical correlation analysis is used to study the relationship between two groups of variables (dependent and independent). Since each group represents the linear combination to a number of variables, canonical correlation analysis measures the relationship between these variables that maximally correlate with linear combinations of another subset of variables. Statistical analysis involves canonical correlation between two groups of variables, canonical variates, standard canonical variates, canonical factor loadings, canonical cross factor loadings for both groups. Test of significance of canonical correlation using Wilk's Lambda showed that the first and second canonical correlation was significant and the third and fourth canonical correlation were insignificant. This method is illustrated by using a real data set. Results obtained by using SPSS program.


Introduction
Canonical correlation analysis (see Weenink, 2003;Fan & Konold, 2010;Vía et al., 2007) is one of the multivariate statistical analysis methods which measures the strength of the overall relationship between the linear structures (Canonical variables) of the dependent and the independent variables.It is a bivariate correlation between two Canonical variables, for example: a group of personal variables and a group of potential measures, group of price indices and a group of production indices, a group of psychological characteristics and a group of physiological characteristics, a group of academic achievement variables and a group of measures for business success (Rencher, 2002).
Canonical correlation analysis is a generalization of the concept of regression analysis, but rather than being a relationship between one variable Y and a group of variables X 1 , X 2 , . . ., X q , the Canonical correlation measures (with respect to) the relationship between a group of independent variables X 1 , X 2 , . . ., X q , and another group of dependent variables Y 1 , Y 2 , . . ., Y p (Hair, 2009).
More statistically sound methods in the field are based on canonical correlation analysis and involve linear and nonlinear relationships between the groups of variables proposed by (Böckenholt and Böcknholt, 1990;Cook et al., 1996;Thorndike, 2000;Hardoon et al., 2004;Thompson, 2005).
Canonical correlation depends on finding a linear function (linear fitting) in the X 1 , X 2 , . . ., X q (variable U) and a linear function (linear fitting) in the Y 1 , Y 2 , . . ., Y p (variable V).The selected function which represents the correlation between the two largest (correlation between U and V is the greatest), in that case, there will be r correlation relations, which is equal to the smallest value between p and q (Gittins, 1985).
The aim of the canonical correlation analysis is to get a simple description of the structure of the relationship between subgroups of variables.
The paper is organized as follows.Canonical correlation analysis is described in section 2. Some important definitions are explained in Section 3. Test of Significance for Canonical Correlation is discussed in section 4. Statistical analysis which involves data collection and empirical results are presented in Section 5. Some concluding remarks are given in section 6.

Canonical Correlation Analysis
Suppose that there are two groups of variables 1 2 [ , ,..., ] Each of them has a variance matrix xx  , yy  respectively, where ) , ( min q p s = . The basic objective of the Canonical correlation is to find the canonical variables , where it should be a correlation between the U and V whichever is greater. Suppose that the joint variance-covariance matrix of the vector 1 2 1 2 ( , ,..., , , ,..., ) ( , ) Also, the joint variance-covariance matrix of the sample Therefore, the correlation between U and V are as follows: - because UV ρ includes canonical variables U and V so-called canonical correlation.
From the previous two equations the correlation matrix can be expressed as follows: and we can prove it by second method as follows: - where ( , ) i i a b is the Eigen vector of matrix S, ( , ) i i c d is the Eigen vector of matrix R, and can be expressed in canonical variables as a vector of variables Standard parameters can also calculated according to the following formula: where they can express on (Z x , Z y ) as standardized variables (Timm, 2002).(Black et al., 1998)

Canonical Function
Represents the relationship (correlation) between two structures (Canonical variables).Each Canonical function has two variables, one of the Canonical group of independent variables and the other to the group of dependent variables.The strength of this relationship is given by the canonical correlation.

Canonical Loadings
A measure of simple linear correlation between the independent variables and canonical variables.Interpretation of the canonical loadings are similar to the interpretation of factor loadings in factor analysis.

Canonical Cross-Loadings
Represents the correlation between the independent or dependent corresponding canonical variables, for example: the independent variables associated with the canonical dependent variables, the dependent variables associated with the canonical independent variables.

Canonical Variates
Represents the linear structure of the total weighted sum of two variables or more and can be defined as either independent or dependent variables.

Canonical Roots
Represent the square of canonical correlation which is used to estimate the amount of the variance between the weighted optimal canonical variables for independent variables and dependent and can be named Eigen values.(Härdle and Simar, 2007) We can test the following alternative hypothesis which is a square canonical correlation of the sample.

Test of Significance for Canonical Correlation
Equation ( 13) proves that the two variables X, Y are uncorrelated linearity and mathematical application (Wilk's λ) will be described almost as a variable distribution following a Chi-square with degrees of freedom v = (p-k) (q-k), as well statistically significant application (Wilk's λ) needs to account for the following statistics: where: n: the number of cases.
Log: the natural logarithm function.
q: the number of variables in the first group.
p: the number of variables in the second group.

Statistical Analysis
Statistical program SPSS was used to find a canonical correlation analysis through finding the correlation matrix between all independent variables and the correlation matrix between all dependent matrix; the correlation matrix between independent and dependent variables in both groups; finding Wilks' Lambda test to see significance of the canonical correlation; finding a canonical correlation between the two groups; finding standard canonical coefficients in the first and second groups; creating factor loadings matrix in the first and second groups; finding cross loadings matrix in first and second groups; and finding the canonical scores of the first and second groups.

Data Collection
The data has been taken from Ibn Sina Hospital (surgery and fractures ward) for ( 80) patients with infection of the urinary tract were selected as a group of variables which could be influential on the disease.These variables were divided into two groups: the first group was a group of personal variables and the second group was a group of pathological variables as follows:

Results and Discussion
By Equation ( 5) will be displayed a correlation matrix for personal variables in addition to the correlation matrix of pathological variables, the joint correlation matrix between personal and pathological variables as shown in tables (1, 2, 3) respectively.4 represents the first and second canonical correlations were significant but the rest of the canonical correlations were not significant, based on the test results (Wilk's lambda).Table 10 represents the factor loadings for the first group (personal variables).Through the first factor we show that the independent variables (X 2 , X 5 ) have a simple linear correlation with the corresponding canonical independent variables, while variables appeared (X 1 , X 2 , X 3 ) with a linear relationship with the corresponding independent variables canonical in Group II.However the rest of the factors (third and fourth) cannot be relied upon to describe the data, because the canonical correlation coefficients were not significant, according to the test (Wilk's lambda).Through the first factor it can be seen that the independent variable (X 2 ) has a linear relationship with the canonical dependent variables.However, the second, third and the fourth factors could not be relied upon in the description of the data, because the canonical correlation coefficients were not significant, according to the test (Wilk's lambda).13 represents cross factor loadings for the second group (pathological variables).Through the first factor it is shown that the dependent variable (Y 2 ) has a linear relationship with the canonical independent variables.The second cross factor loading does not appear to have any significant effect variable with any of the supported canonical changes.The third and fourth factor could not be relied upon in the description of the data because the canonical correlation coefficients were not significant, according to the Wilk's lambda test.The canonical values of the first and second groups are shown in Table 14 in Appendix A.

Conclusions
Canonical correlation analysis method (CCA) is very useful in interpretation of data by discovering the structures and similar relationships between two sets of multi-dimensional variables and categories of those variables are often used in medical data.The significant values (sig.) are selected when the value is less than or equal to 0.05.Wilk's lambda test showed significant first and the second canonical correlation and the rest of the canonical correlations were not significant.
There is a strong relationship between the first group (personal variables) and the second group (pathological variables), because the correlation function has worked to maximize the correlation between two groups, and through factor loading matrix have been identified canonical variables that have a relationship with the original values.There are significant and non signifigant relationships in the results of factor loadings and cross factor loadings.Theses results are are very important to interpretation the correlation relationships between the depandent and the independent variable groups.
Appendix A coefficient and by using the most common measures used a Wilk's lambda described as follows:

Table 1 .
Correlation Matrix between the Independent Variables

Table 5 ,
which contains the first, second, third and fourth canonical correlation, the canonical correlations are strong between the first canonical variable which is extracted from the canonical correlation function.The rest of the canonical correlation was weak and it refers to the weakness of the relationship between the canonical variables extracted from the functions and canonical correlation.

Table 11
Table 11 represents cross loadings of the first group (personal variables).

Table 12
Table12above represents canonical loadings for the second group (pathological variables) and through the first factor it is shown that the variable (Y 2 ) has a simple linear correlation with the corresponding canonical dependent variables, while the variable (Y 1 ) has a linear relationship with the corresponding canonical dependent variables in the second group.The rest of the factors (third and fourth) cannot be relied upon to describe the data, because the canonical correlation coefficients were not significant, according to the test (Wilk's lambda).

Table 14 .
Canonical scores for first and second Groups