A Binary Search Algorithm for Correlation Study of Decay Centrality vs . Degree Centrality and Closeness Centrality

Results of correlation study (using Pearson's correlation coefficient, PCC) between decay centrality (DEC) vs. degree centrality (DEG) and closeness centrality (CLC) for a suite of 48 real-world networks indicate an interesting trend: PCC(DEC, DEG) decreases with increase in the decay parameter δ (0 < δ < 1) and PCC(DEC, CLC) decreases with decrease in δ. We make use of this trend of monotonic decrease in the PCC values (from both sides of the δ-search space) and propose a binary search algorithm that (given a threshold value r for the PCC) could be used to identify a value of δ (if one exists, we say there exists a positive δ-spacer) for a real-world network such that PCC(DEC, DEG) ≥ r as well as PCC(DEC, CLC) ≥ r. We show the use of the binary search algorithm to find the maximum Threshold PCC value rmax (such that δ-spacermax is positive) for a real-world network. We observe a very strong correlation between rmax and PCC(DEG, CLC) as well as observe real-world networks with a larger variation in node degree to more likely have a lower rmax value and vice-versa.


Introduction
The Decay Centrality (DEC) metric is a parameter-driven centrality metric that has not been explored much in the literature for complex network analysis.Decay centrality is a measure of the closeness of a node to the rest of the nodes in the network (Jackson, 2010).However, unlike closeness centrality (CLC; Freeman, 1979), the importance given to the distance (typically, in terms of the number of hops if the edges do not have weights) is weighted in terms of a parameter called the decay parameter δ (0 < δ < 1).The formulation for computing the decay centrality of a vertex v i for a particular value of the decay parameter δ is (Jackson, 2010); see Section 2.3.The decay parameter δ essentially controls how important is a node v j to a node v i (v i ≠ v j ) that are at a distance d(v i , v j ) from each other.If δ is smaller, the distance to the nearby nodes is weighted significantly larger than the distance to the nodes farther away.If δ is larger, the distance to every node is given almost the same importance.As a result, if δ is closer to 0, the decay centrality of the vertices is more likely to exhibit a very strong positive correlation with the degree centrality of the vertices; if δ is closer to 1, the decay centrality of the vertices is more likely to exhibit a very strong positive correlation with the closeness centrality of the vertices.
The motivation for our research came from the initial results (see Figures 1 and 16 for sample results) of our correlation study (conducted with a precision level of ∈= 0.01) which indicated that the Pearson's correlation coefficient PCC(DEC δ , DEG) decreases with increase in δ from 0.01 to 0.99 and PCC(DEC δ , CLC) decreases with decrease in δ from 0.99 to 0.01.Such a trend was observed for all the 48 real-world networks with spectral radius ratio for node degree (Meghanathan, 2014) ranging from 1.01 to 5.51 used in the correlation study.In this paper, we show that this trend could be exploited by developing an efficient binary search algorithm to determine (given a threshold PCC value of r) whether there exists a δ value for which PCC(DEC δ , DEG) as well as PCC(DEC δ , CLC) are both greater than or equal to r.If such a δ value is found to exist, we say that there is a positive δ-space r for the real-world network with respect to the threshold PCC (r) for DEC vs. DEG and CLC.We demonstrate the use of the binary search algorithm to determine the maximum value for the threshold PCC (r max ) for a real-world network such that δ-space rmax is positive.Our approach is significantly efficient compared to the brute-force approach of computing the DEC values for all possible values of δ (note that the δ-search space is a continuous space rather than discrete).

Dolphin Network
US Politics Book Network Network Science Co-author Net.(Lusseau et al., 2003) (Krebs, 2003) (Newman, 2006) Figure 1.Sample PCC(DEC δ , DEG) vs. PCC(DEC δ , CLC) Distributions of Real-World Networks The rest of the paper is organized as follows: In Section 2, we review the centrality metrics (DEG, CLC and DEC) and the Pearson's correlation measure as well as explain their computation with an example graph.Section 3 first introduces the notion of δ-space r for a threshold PCC (r) for DEC-DEG and DEC-CLC correlation (and its computation on the example graph of Section 2).Section 4 describes the proposed binary search algorithm to search for a δ-value in δ-space r and illustrates its execution with the running example graph of Sections 2-3 for a successful search and an unsuccessful search.Section 4 also illustrates the use of the proposed binary search algorithm to determine the maximum threshold PCC (r m ax ) for a network.Section 5 introduces the real-world networks that are analyzed in this paper.Section 6 presents the δ-space r values and the r max values for the 48 real-world networks obtained as a result of executing the binary search algorithm.Section 6 also compares the performance of the binary search algorithm vis-a-vis the brute force approach with respect to the number of decay centrality computations needed before deciding whether a real-world network has a positive δ-space r or not.Section 7 discusses related work and highlights the contributions of our paper.Section 8 concludes the paper.

Review of Centrality Metrics and Pearson's Correlation Measure
The centrality metrics that are of interest in this research are degree centrality (DEG), closeness centrality (CLC) and decay centrality (DEC).In this section, we briefly review these three metrics and their computation using a running example graph as well as review the Pearson's correlation measure and its computation with respect to the DEG and CLC metrics for the running example graph.

Degree Centrality
The degree centrality (DEG) of a vertex is the number of neighbors incident on the vertex.Figure 2 illustrates the degree centrality of the vertices (listed above the vertices) in the example graph used in Sections 2-3.A key weakness of the degree centrality metric is that the metric can take only integer values and ties among vertices (with same degree) is quite common and unavoidable in network graphs of any size (in the graph of Figure 2, we observe five of the nine vertices to have a degree of 3).It takes Θ(V) time to go through the adjacency list of each vertex; hence, it takes Θ(V 2 ) time to compute the degree centrality for a graph of V vertices.

Closeness Centrality
The closeness centrality (CLC) of a vertex (Freeman, 1979) is a measure of the closeness of the vertex to the rest of the vertices in a graph.The CLC of a vertex is computed as the inverse of the sum of the hop counts of the shortest paths from the vertex to the rest of the vertices in the graph.To determine the CLC of a vertex, we could use the Θ(V+E)-Breadth First Search (BFS) algorithm (Cormen et. al., 2009) to determine a shortest path tree rooted at the vertex and find the sum of the level numbers of the vertices on this shortest path tree.We want to maintain the convention that larger the centrality value for a vertex, more important is the vertex.Hence, we find the inverse of the final sum of the level numbers of the vertices on the BFS-tree of a vertex and use it as the CLC of the vertex (rather than using just the sum of the level numbers as the CLC).Since we need to run the BFS algorithm once for each vertex, the overall time complexity to determine the CLC of the vertices is Θ(V(V+E)) = Θ(V 2 +VE).Figure 3 illustrates the distance matrix (hop counts of the shortest paths between any two vertices) for the example graph of Figure 2 and also displays the CLC of the vertices.Vertex 1 is the closest vertex to the rest of the vertices (sum of the distances is 12, the minimum) and hence has the largest CLC value of 1/12 = 0.083.

Decay Centrality
Decay centrality (DEC) is a measure of the closeness of a node to the rest of the nodes in the network (Jackson, 2010).However, unlike closeness centrality, the importance given to the distance (typically, in terms of the number of hops if the edges do not have weights) is weighted in terms of a parameter called the decay parameter δ (0 < δ < 1).The formulation for computing the decay centrality of a vertex v i for a particular value of the decay parameter δ is (Jackson, 2010): The decay parameter δ essentially controls how important is a node v j to a node v i (v i ≠ v j ) that are at a distance d(v i , v j ) from each other.Nodes that have a higher decay centrality are more likely to be nodes that have several neighbors as well as be much closer to nodes to the rest of the nodes in the network (Tsakas, 2016).Figure 4 presents the decay centrality of the vertices in the example graph of Section 2 for different values of the decay parameter δ.We also illustrate sample calculations of the decay centrality of vertex 1 for three different values of δ.
On a graph of V vertices and E edges, it takes Θ(V 3 ) time to compute the distance matrix (the shortest path weights between any two nodes in the network) using the Floyd-Warshall algorithm (Cormen et. al., 2009) for weighted graphs and Θ(V(V+E)) time to compute the distance matrix (the hop count of the shortest paths between any two nodes in the network) using the Θ(V+E)-BFS algorithm for graphs with unit edge weights.Since E = Θ(V 2 ), we could say, in general, it takes Θ(V 3 ) time to compute the distance matrix for any graph.Given the distance matrix for a graph, it takes Θ(V) time to compute the decay centrality of a particular vertex v i as we have to find δ d (vi, vj) for every vertex v j ≠ v i .Hence, given the distance matrix for a graph, it would take Θ(V 2 ) time to compute the decay centrality of the vertices.Overall, given a graph of V vertices, the time-complexity to compute decay centrality is Θ(V 3 ) + Θ(V 2 ) = Θ(V 3 ).Thus, the time-complexity to compute the decay centrality of the vertices is dominated by the time-complexity to compute the distance matrix.

Pearson's Correlation Measure
We use the Pearson's correlation coefficient (PCC; Lay et. al., 2015) as the measure for analyzing the correlation between the decay centrality (computed for different values of the decay parameter δ) and the degree centrality and closeness centrality.The Pearson's product-moment correlation when applied for centrality metrics is a measure of the linear dependence between any two metrics in consideration (Lay et. al., 2015).It is referred to as the product-moment based correlation as we calculate the deviation of the data points from their mean value ('mean' is also referred to as 'first moment' in statistics) and use them in the formulation below to calculate the correlation coefficient.If X and Y are the datasets for two centrality metrics: let X i and Y i indicate the centrality values for the individual vertices v i (1 ≤ i ≤ n, where n is the number of vertices) and X and Y are the average of the centrality values; PCC(X, Y) is calculated as follows.Figure 5 illustrates the computation of the Pearson's correlation coefficient between DEG and CLC.

(
) ( )  5. We see PCC(DEC δ , DEG) to monotonically decrease with increase in δ and PCC(DEC δ , CLC) to monotonically decrease with decrease in δ.A similar trend is also noticed for all the 48 real-world network graphs analyzed in Section 6.Using this as the basis, we define the δ-space r for a real-world network with respect to a threshold PCC (r) as the difference between the maximum δ value for which we observe PCC(DEC δ , DEG) ≥ r and the minimum δ value for which we observe PCC(DEC δ , CLC) ≥ r.  ] that could be chosen from the closed interval (0...1) to determine decay centrality values that exhibit PCC of the threshold value of r or above with both degree and closeness centralities.Note that we did not choose the interval (0, 1] for δ, as δ = 1 would correspond to the component size and not quantify the centrality of the vertices.On the same lines, we did not choose the interval [0, 1) for δ, as δ = 0 would make the decay centrality of the vertices to become zero.Quantitatively, δ-space r is defined as follows, where ε corresponds to the level of precision used for δ in the range (0...1).Note that δ-space r could be determined for any threshold value of the Pearson's correlation coefficient (r) of interest., it implies there is not even one single δ value for which DEC δ would exhibit the threshold PCC of r or above with both DEG and CLC.Hence, we do not add the precision level ε in the δ-space r formulation for min + ε = 0.80 -0.03 + 0.01 = 0.78.Note that 0.78 is also PCC(DEG, CLC) for the example graph (computed in Figure 5).
Table 1 displays the δ-space r values for different values of the threshold PCC(r), ranging from 0.70 to 0.95 for the running example graph of Figures 2-5.Note that δ-space r decreases as the threshold PCC (r) value increases.

End While End Binary Search Algorithm
Like the standard binary search algorithm, the binary search algorithm described here also maintains two indices: a left index and a right index, and the two indices approach towards each other during the course of the algorithm.Also, as in the standard binary search algorithm, we exit from the iterations (described below) when the right index becomes larger than the left index.An invariant in the binary search algorithm is that the left index is a δ value for which PCC(DEC δ , DEG) ≥ threshold r and the right index is a δ value for which PCC(DEC δ , CLC) ≥ threshold r.As part of optimization, before proceeding to the iterations of the algorithm, one can test whether PCC(DEC δ=initial left index , CLC) ≥? threshold r.If so, we are done and the δ value of interest could be the value of the initial left index itself.This would be especially useful for lower values of the threshold PCC, as we observe  (ii) If PCC(DEC δ=m iddle index , DEG) < r and PCC(DEC δ= middle index , CLC) ≥ r, we move the right index to the middle index (as the DEC δ -DEG correlation coefficient is less than r for δ = middle index and it can only decrease further if we move to the right of the middle index; hence, we shrink the δ-search space to the left of the middle index).
(iii) If PCC(DEC δ=m iddle index , DEG) < r and PCC(DEC δ=m iddle index , CLC) < r, we exit the algorithm and declare that there is no single δ value for which DEC δ would exhibit the threshold value of correlation coefficient or above with both DEG and CLC.This is because: as per the observation DEC δ=middle index not exhibiting the threshold value of correlation coefficient with DEG, the δ value of interest has to be in the range [left index ... middle index); whereas, as per the observation DEC δ=m iddle index not exhibiting the threshold value of correlation coefficient with CLC, the δ value of interest has to be in the range (middle index ... right index].The two ranges do not overlap and hence there cannot be a δ value for which DEC δ exhibits the threshold value of correlation coefficient with both DEG and CLC.
The binary search algorithm could be executed over a δ-search space of [0+∈...1-∈], wherein the initial values for the left index and right index are respectively 0+∈ and 1-∈.The value of ∈used in this paper is 0.01.
Figure 9 presents a scenario wherein there is no need to proceed to the iterations of the binary search algorithm and a simple initial test is sufficient to determine the existence of δ-space r .The initial test is to check whether PCC(DEC δ=initial left index = 0.01, CLC) ≥ threshold PCC of r (= 0.75, in Figure 9).Since, PCC(DEC 0.01 , CLC) ≥ 0.75 and we already know that PCC(DEC 0.01 , DEG) ≥ 0.75, we can stop right away and declare that the δ of interest is 0.01 and that δ-space r exists for a threshold PCC r value of 0.75.The middle index at the end of the fourth iteration is (0.255 + 0.3775)/2 = 0.316 and DEC 0.316 (i.e., DEC values computed for δ = 0.316) does not exhibit a correlation coefficient of 0.95 or above with both DEG and CLC.Hence, the range for DEC to exhibit a correlation coefficient of 0.95 or above with DEG has to be towards the left of this middle index (i.e., < 0.316) and the range for DEC to exhibit a correlation coefficient of 0.95 or above with CLC has to be towards the right of this middle index (i.e., > 0.316).This is not possible and hence we stop and declare that δ-space r=0.95 does not exist for this graph.
The basic operation (the most time consuming step) of the binary search algorithm is the computation of the decay centrality of the vertices in each iteration as well as during the initial test with the left index.Since the δ-search space is a continuous search space (i.e., real numbers from 0 to 1), we are not able to theoretically quantify the average and worst-case number of times the basic operation would be executed as part of the binary search algorithm.Nevertheless, as seen in the experimental results presented in Section 6 for real-world networks, we anticipate the number of times the basic operation is executed as part of the binary search algorithm (both the average and worst-case) to be significantly smaller than the number of times the basic operation is executed as part of a brute-force search.We now explain the use of the binary search algorithm to determine the maximum value of the threshold PCC (r max ) for which there exists a positive δ-space.The idea is to start with a tentative r max value of 1.0 and use the binary search algorithm to test if there exists a positive δ-space tentative rmax .If so, we stop and declare the tentative r max as the final r max value.Otherwise, we reduce the value of tentative r max by 0.01 and continue the above test until we find a tentative r max for which there exists a positive δ-space.We also go ahead and determine the δ value (referred to as δ rmax ) that was found to be part of the δ-space rmax .The r max value for the example graph in Figures 2-5 is 0.94 and the corresponding δ rmax value is 0.2971.This is evident from Figure 6 in which we show the decrease in the PCC(DEC δ , DEG) values and the increase in PCC(DEC δ , CLC) values with increase in δ from 0.01 to 0.99.The values of (δ rmax , r max ) = (0.2971, 0.94) correspond to the intersection point between the two PCC curves.The δ rmax value of 0.2971 could be approximated to the value of 0.30 as it appears in Figure 6.

Real-World Network Graphs
In this section, we introduce the 48 real-world networks analyzed in this paper.Table 2 lists the three character code acronym, name and the network type, the values for the number of nodes and edges as well as the spectral radius ratio for node degree (λ sp ).All the real-world networks are modeled as undirected graphs.The spectral radius ratio for node degree (Meghanathan, 2014) is a measure of the variation in node degree and is calculated as the ratio of the principal eigenvalue (Bonacich, 1987) of the adjacency matrix of the graph to that of the average node degree.The spectral radius ratio for node degree is independent of the number of vertices and the actual degree values for the vertices in the graph.The spectral radius ratio for node degree is always greater than or equal to 1.0; the farther is the ratio from the value of 1.0, the larger the variation in node degree.The spectral radius ratio for node degree for the real-world network graphs analyzed in this paper ranges from 1.01 to 5.51 (indicating that the real-world network graphs analyzed range from random networks (Renyi, 1959) with smaller variation in node degree to scale-free networks (Barabasi & Albert, 1999) of larger variation in node degree).
The networks considered cover a broad range of categories (as listed below along with the number of networks in each category): I. Acquaintance network ( 12 A friendship network is a kind of social network in which the participant nodes closely know each other and the relationship is not captured over an observation period.A co-appearance network is a network typically extracted from novels/books in such a way that two characters or words (modeled as nodes) are connected if they appear alongside each other.An employment network is a network in which the interaction/relationship between people is primarily due to their employment requirements and not due to any personal liking.A citation network is a network in which two papers (nodes) are connected if one paper cites the other paper as reference.A collaboration network is a network of researchers/authors who are listed as co-authors in at least one publication.
A biological network is a network that models the interactions between genes, proteins, animals of a species, etc.A political network is a network of entities (typically politicians) involved in politics.A game network is a network of teams or players playing for different teams and their associations.A literature network is a network of books/papers/terminologies/authors (other than collaboration, citation or co-authorship) involved in a particular area of literature.A transportation network is a network of entities (like airports and their flight connections) involved in public transportation.A trade network is a network of countries/people involved in certain trade.

Results of Correlation Study
In this section, we present and analyze the results of our correlation study for the 48 real-world networks.Table 3 lists the δ-space r values and the number of decay centrality computations (the basic operation) for the binary search algorithm.The values for the threshold PCC in Table 3 range from 0.60 to 0.95.As expected, the δ-space r values decrease as the value for the threshold PCC (r) increases.The median of the δ-space r values for the different threshold r values are also given in the bottom of Table 3. From median value 0.99 for r = 0.60, the median reduces to 0.515 for r value of 0.80 and to -0.14 for r value of 0.95.Cells with negative δ-space r values are highlighted in light pink color.Though the author Facebook network had negative δ-space r values for r values starting from 0.60, we notice the negative δ-space r values for the real-world networks are more prominent for r values starting from 0.80.More than 40% and 50% of the real-world networks had negative δ-space 0.90 and δ-space 0.95 values.Nevertheless, there are six networks that continue to have a δ-space r value of 0.99 for all r values presented in Table 3.A closer look at these six networks reveals that the λ sp (spectral radius ratio for node degree) values for all these six networks are less than 1.5.A further analysis of the δ-space 0.95 values of the real-world networks and the λ sp values indicates that out of 25 networks that had a λ sp value above 1.5, 22 networks (close to 90%) incurred negative values for δ-space 0.95 , whereas out of the remaining 23 networks (that had a λ sp value below 1.5), only 7 networks (close to 30%) incurred negative values for δ-space 0.95 .This indicates a trend that networks with lower variation in node degree are more likely to have a positive δ-space r value (even for larger values for the threshold PCC r), whereas networks with larger variation in node degree are more likely to incur negative values for δ-space r .In other words, for networks with larger variation in node degree, there is more likely not even a single δ value for which we could expect a stronger correlation for DEC with both DEG and CLC simultaneously.Figure 12 presents the distribution of the number (#) of decay centrality computations vs. the δ-space r values of the real-world networks for different values of the threshold PCC (r).We observe the # decay centrality computations to be lower for larger positive values of δ-space r as well as for larger negative values of δ-space r .
On the other hand, the # decay centrality computations is relatively larger for lower positive as well as lower negative values of δ-space r .This is because for larger values of δ-space r , it is more likely that the middle index of the binary search algorithm will soon correspond to a δ value that falls within the range for δ-space r .If the δ-space r value is smaller, it takes relatively more iterations (and as a result a larger number of decay centrality computations) before the algorithm could identify a δ value that falls within the range for δ-space r .Note that if δ-space r values are negative, it means there does not exist a δ-value for which PCC(DEC δ , DEG) ≥ r as well as PCC(DEC δ , CLC) ≥ r; this also means that there exists one or more δ values for which PCC(DEC δ , DEG) < r as well as PCC(DEC δ , DEG) < r.

Binary Search vs. Brute Force Search
We now present a comparison of the time complexity (for both successful searches and unsuccessful searches considered together and separately) on the basis of the number of decay centrality computations incurred with the proposed binary search approach vs. brute force search approaches from left as well as from right of the δ-search space.Under the brute force approach from left, for each real-world network and a given threshold PCC (r), we iterate through the values of δ from 0.01 to 0.99 (in this order), in increments 0f 0.01, and determine the smallest δ value (if one exists) for which PCC(DEC δ , DEG) ≥ r and PCC(DEC δ , CLC) ≥ r.Under the brute force approach from right, we iterate through the values of δ from 0.99 to 0.01 (in this order), in decrements of 0.01, and determine the largest δ value (if one exists) for which PCC(DEC δ , DEG) ≥ r and PCC(DEC δ , CLC) ≥ r.To be fair to the brute force approaches, we adopt the same condition (used for the binary search approach) to terminate the search a priori (instead of searching through the entire δ-search space) if we encounter a δ value for which both PCC(DEC δ , DEG) < r and PCC(DEC δ , CLC) < r.When such a δ value is encountered for a given threshold PCC (r) for a real-world network, it implies the δ-space r for the real-world network is negative and the search for a δ to satisfy the threshold PCC value (r) would be unsuccessful.The number of decay centrality iterations incurred until encountering the smallest δ value (if proceeded from the left) or the largest δ value (if proceeded from the right) that satisfies the condition for a successful search or the δ value that satisfies the condition for an unsuccessful search is recorded.For the binary search and the two variants of brute force search algorithms and for each value of the threshold PCC (r: from 0.60 to 0.95, in increments of 0.05), we determine the average number of decay centrality iterations (averaged over all the 48 real-world networks) for successful search and unsuccessful search considered together and considered separately as well as the worst-case number of decay centrality iterations (the maximum of the number of decay centrality iterations incurred among the 48 real-world networks).
We observe the binary search algorithm to comprehensively outperform the brute force search methods (from left and from right).In Figure 13, the average-case number of decay centrality computations (when the successful and unsuccessful searches are considered together and when considered alone) for the binary search algorithm are very much comparable (or even lower in most of the cases) to the logarithm of the number of decay centrality computations incurred with the brute force search methods.The worst-case number of decay centrality computations incurred with the binary search method is appreciably lower than the square root of the worst-case number of decay centrality computations incurred with the brute force search methods.
It is to be noted that the # decay centrality computations incurred with the binary search method (for both   Like in the case of δ-space r , we also observe networks with larger variation in node degree to incur lower values for r max and networks with low variation in node degree to incur larger r max values.Of the 27 real-world networks that had r max values of 0.9 or above, 19 of these networks had λ sp (spectral radius ratio for node degree) values less than 1.5.On the other hand, 12 of the 13 real-world networks with r max values less than 0.8 had λ sp values above 1.5.Figure 17 plots the distribution of the r max vs. λ sp values for real-world networks.

Correlation between the Maximum Threshold PCC Value (r max ) and the Pearson's Correlation between Degree and Closeness Centrality Metrics
As part of further analysis, we analyzed the correlation between the r max values observed for the real-world networks and the Pearson's correlation coefficient between DEG and CLC.We observe a very strong positive correlation between the PCC(DEG, CLC) and the r m ax values.The regression equation is shown below (equation 4); the R 2 for this straight line fit (shown in Figure 18) is 0.9485 and the Standard Error of the Residuals (SER) is 0.025, a significantly smaller value given that the range for r max is from -1 to 1. Figure 19 presents the distribution of the actual r max values vs. the r max values predicted using the actual values of the PCC(DEG, CLC) and the regression equation ( 4); we observe the data points to lie close to the diagonal line, justifying the smaller SER value for the prediction.
Predicted r max = 0.3792 * PCC(DEG, CLC) + 0.626 (4) Figure 18.PCC (DEG, CLC) vs. the Maximum Threshold PCC (r max ) Value for Real-World Networks Though there is a very strong linear correlation, we observe (from Figure 17) the r max value for a real-world network to be typically much larger than the PCC(DEG, CLC) value for the network.The median of the PCC (DEG, CLC) values is 0.728, while the median of the r max values is 0.915.Note that, as per the ordinal scale proposed by Evans (1995), 0.8 is typically the minimum correlation coefficient value expected for two metrics to be considered to exhibit a very strongly positive correlation.We observe 35 of the 48 real-world networks (i.e., more than 2/3rds of the networks) to have a r max value of 0.8 or above.Thus, though the PCC(DEG, CLC) for a real-world network might be low, we observe that there exist at least one value of δ (δ rmax that could be efficiently found by our binary search algorithm) for which we could simultaneously find a relatively stronger correlation between DEG and DEC as well as between CLC and DEC.

Related Work and Our Contributions
Decay centrality has not been explored much in the literature for complex network analysis.To the best of our knowledge, ours is the first work to conduct a correlation study focusing on decay centrality.Most of the work (e.g., Li et. al., 2015, Meghanathan, 2015) on correlation studies (involving centrality metrics) were focused on the commonly studied centrality metrics such as the neighborhood-based degree centrality and eigenvector centrality (Bonacich, 1987) and shortest path-based betweenness centrality (Freeman, 1977) and closeness centrality.The objective of such correlation studies has been typically to identify computationally-light alternatives (like DEG and its derivatives; Meghanathan, 2017) for computationally-heavy metrics (such as EVC and BWC) for both real-world networks and simulated networks of theoretical models (Renyi, 1959;Barabasi & Albert, 1999).The focus of our paper is different from such typical correlation studies in the literature.We seek to explore the trend of change in the correlation coefficients between a parameter-driven centrality metric (whose values for a node change for different values of the decay parameter) and the degree and closeness centrality metrics whose values are not parameter-driven and remain the same for a particular network.
The most related work to our work is a recent study (Tsakas, 2016) on random networks (Renyi, 1959) for which a single threshold value of the decay parameter (referred here as δ thresh ) was observed to exist (for a particular operating condition) such that nodes with high degree centrality also had a high decay centrality computed for δ values less than δ thresh and nodes with high closeness centrality also had a high decay centrality computed for δ values above δ thresh .It was observed by Tsakas (2016) that for random networks: nodes with the largest values for degree centrality and closeness centrality are more likely to be nodes that also incur the largest values for decay centrality for almost all values of δ.In addition, nodes that had the largest decay centrality for a certain value of δ are more likely to be part of the set of nodes that had the largest degree centrality or the largest closeness centrality.The likelihood of all of the above was studied using multinomial logistic regression (Greene, 2011).Most of the other works (e.g., Chatterjee & Dutta, 2015;Kang et. al., 2012) on decay centrality metric have focused on exploring its suitability for diffusion in socio-economic networks with regards to selecting the seed nodes that could effectively propagate information about a product to putative customers.Nodes that are themselves central and connected to other central nodes (via direct links or shorter paths) in the network are typically preferred for such "agent" roles (Tsakas, 2016;Chatterjee & Dutta, 2015).The use of decay centrality vis-a-vis diffusion centrality (Kang et. al., 2012) and eigenvector centrality (Ide et. al., 2014;Banerjee et. al., 2013) to identify such "agent" nodes for diffusion has been explored in the literature.
Our paper differs from all of the above work and is innovative on the following lines: We analyze real-world networks rather than the simulated random networks.We use the Pearson's correlation measure to study the correlation between the actual centrality values rather than multinomial logistic regression (Greene, 2011) to study the sets of vertices that had the largest values of centrality.We have unearthed the trend (not known until now) that the Pearson's correlation coefficient between decay centrality and degree centrality decreases with increase in the value of the decay parameter δ and that the Pearson's correlation coefficient between decay centrality and closeness centrality decreases with decrease in δ.We have developed an efficient binary search algorithm that makes use of the above phenomenon to determine the existence (or the lack of it) of one or more δ values (collectively referred to as δ-space r ) for which PCC(DEC δ , DEG) ≥ r and PCC(DEC δ , CLC) ≥ r for a threshold PCC (r).We also demonstrate the use of the binary search algorithm to determine the maximum threshold PCC (r max ) value that could be observed between DEC-DEG as well as between DEC-CLC for a real-world network.We observe this r max value to be appreciably larger than PCC(DEG, CLC) for most of the real-world networks, and also show that it could be accurately predicted using the latter.One could thus run the binary search algorithm in the vicinity of the predicted r max value for a real-world networks and determine a δ value for which we observe the maximum threshold PCC between DEC and DEG as well as between DEC and CLC.

Conclusions
Our contributions in this paper are as follows: For each of the 48 real-world networks (of diverse degree distributions) analyzed in this paper, we observe the Pearson's Correlation Coefficient (PCC) between degree centrality (DEG) and decay centrality (DEC) to monotonically decrease with increase in the decay parameter (δ), and the PCC between closeness centrality (CLC) and decay centrality to monotonically increase with increase in δ.We have explored this phenomenon and proposed a binary search algorithm that could be used (for a given threshold PCC r) to determine the existence of a positive δ-space r (or the absence of the same) comprising of one or more δ values for which PCC(DEC δ , DEG) ≥ r and PCC(DEC δ , CLC) ≥ r.In addition, we show the use of the binary search algorithm to determine the maximum threshold PCC (r max ) value for a real-world network and the prediction of the same using PCC(DEG, CLC).The r max value for a real-world network would be a measure of the extent to which the degree centrality or closeness centrality metrics could serve as alternatives to the decay centrality metric and vice-versa.If the predicted r max value for a real-world network is high, then one could run the binary search algorithm in the vicinity of the r max value to determine a value of δ (δ rmax ) for which PCC(DEC δ-rmax , DEG) = r m ax and PCC(DEC δ-rmax , CLC) = r m ax .As vertices with large decay centrality are preferable for diffusion, our approach of determining the r max value and the corresponding δ rm ax value for a real-world network using the proposed binary search algorithm could bring significant savings in the process of exploring a suitable δ value for which DEC exhibits the largest correlation coefficient value with both DEG and CLC.

Figure 2 .
Figure 2. Degree Centrality of the Vertices in an Example Graph

Figure 3 .
Figure 3. Closeness Centrality of the Vertices in an Example Graph

Figure 4 .
Figure 4. Decay Centrality of the Vertices in an Example Graph

Figure 5 .
Figure 5. Sample Illustration of the Computation of the Pearson's Correlation Coefficient between Degree Centrality and Closeness Centrality

Figure 6 .
Figure 6.Distribution of the Pearson's Correlation Coefficient Values between Decay Centrality and the Two Centrality Metrics (Degree Centrality and Closeness Centrality) vs. the Decay Parameter δ for the Example Graph of Figures 2-5 δ-space r for a threshold PCC (r) basically quantifies the range of values in the open interval [

Figure 7 .
Figure 7. Shrinking δ-Space r with Increase in Threshold PCC (r) for the Example Graph of Figures 2-5

-Figure 8 .
Figure 8. Pseudo Code for the Proposed Binary Search Algorithm to Determine the Existence of δ-Space r and Maximum Threshold PCC for DEC-DEG and DEC-CLC Correlation to 0.01 for lower values for the threshold PCC r.

Figure 9 .
Figure 9. Example to Illustrate the Execution of the Optimization Step (Initial Test with the Left Index) of the Binary Search Algorithm to Determine the Existence of δ-space r (Threshold PCC r = 0.75)

Figure 10 .
Figure 10.Example to Illustrate the Iterations of a Successful Search of the Binary Search Algorithm to Determine the Existence of δ-space r (Threshold PCC r = 0.90)

Figure 11 .
Figure 11.Example to Illustrate the Iterations of an Unsuccessful Search of the Binary Search Algorithm to Determine the Existence of δ-space r (Threshold PCC r = 0.95) Figure 12.Binary Search Algorithm: δ-space r vs. # Decay Centrality Computations for Different Values of the Threshold Pearson's Correlation Coefficient (r) Figure 14.Real-World Networks in the Decreasing Order of δ-space r≡vs+ Values (0 < δ-space r≡vs+ < 0.99)

# 31 :
Figure 16.Samples of the PCC Distributions for DEC vs. DEG and DEC vs. CLC and their Intersection for Real-World Networks in the Decreasing Order of the r max Values

Figure 17 .
Figure 17.Spectral Radius Ratio for Node Degree vs. Maximum Threshold PCC (r max ) for Real-World NetworksThe sub figures of Figure16also stand as testimony to our earlier statement about the results of our initial correlation study that the PCC(DEC δ , DEG) monotonically decreases with increase in δ and PCC(DEC δ , CLC) monotonically increases with increase in δ.It is this phenomenon that forms the backbone of our binary search algorithm and enabled us to initially setup the left and right indexes to correspond to the δ values at which we observe the max PCC value with DEG and CLC respectively, and later move these indexes towards each other by maintaining the invariant that the left index always corresponds to a δ value at which the PCC(DEC δ , DEG) is greater than or equal to the threshold PCC (r) and likewise the right index always corresponds to a δ value at which the PCC(DEC δ , CLC) is greater than or equal to the threshold PCC (r).The algorithm seeks to find a middle index (if δ-space r is positive), which corresponds to the average of the δ values represented by the left index and right index, such that the PCC(DEC middle index , DEG) and PCC(DEC middle index , CLC) are both greater than or equal to the threshold PCC (r).If the algorithm comes across a middle index for which both PCC(DEC middle index , DEG) and PCC(DEC middle index , CLC) are less than the threshold PCC (r), it is guaranteed (due to the monotonically non-increasing and non-decreasing trend of the PCC values) that the δ-space r for the real-world network is negative.

Figure 19 .
Figure 19.Distribution of Actual vs. Predicted r max Values for the Real-World Networks

Table 1
. δ-space r Values for the Example Graph of Figures 2-5 for Different Values of Threshold PCC(r) Threshold PCC(r)

Binary Search Algorithm to Determine the Existence of δ-Space r and Maximum Threshold PCC for DEC-DEG and DEC-CLC Correlation We
now describe a binary search algorithm (see Figure8for the pseudo code) whose objective is (given a threshold PCC value of r) to find a particular value of the decay parameter δ (if one exists) such that PCC(DEC δ , DEG) ≥ r and PCC(DEC δ , CLC) ≥ r.If a δ value could be found for a real-world network, it implies there does exist a overlap of the intervals [0+ ε ... max d... 1-ε ] and we could conclude that the δ-space r for the real-world network exists (i.e., δ-space r > 0).If a δ value could not be found, it implies the intervals [0+ ε ...

Table 3 .
δ-space r and the Number of Decay Centrality Computations of Real-World Networks for Different Values of the Threshold PCC Values (r)

Table 4 .
Maximum Threshold PCC (r max ) Values and Corresponding δ rmax Values for the Real-World Networks