Built-in Algorithms#
The graph analytical engine (GAE) of GraphScope offers many built-in algorithms, which enable users to analyze their graph data with least effort. The built-in algorithms covers a wide range of applications, such as the shortest path, community detection, clustering, etc. The built-in algorithms are implemented with the PIE programming model and highly optimized for the best performance, and users can use them in the out-of-box manner.
Here is the full list of supported built-in algorithms:
All Pairs Shortest Path Length#
This algorithm is used to find all pair shortest path problem from a given weighted graph. As a result of this algorithm, it will compute the minimum distance from any vertex to all other vertices in the graph.
- all_pairs_shortest_path_length()#
Compute the shortest path lengths between all vertices in the graph.
Attribute Assortativity#
Assortativity in a graph refers to the tendency of vertices to connect with other similar vertices over dissimilar vertices. Attribute assortativity is a measure of the extent to which vertices with the same properties connect to each other.
- attribute_assortativity()#
Compute assortativity for vertex attributes.
Average Degree Connectivity#
The average degree connectivity is the average nearest neighbor degree of vertices with degree k. This algorithm returns a list of k-average degree connectivity for a graph for successive k=0,1,2….
- average_degree_connectivity(source_degree_type, target_degree_type)#
Compute the average degree connectivity of the graph.
- Parameters:
source_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for source vertex
target_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for target vertex
Betweenness Centrality#
Betweenness centrality is a measure of centrality in a graph based on shortest paths. Betweenness centrality of a vertex v is the sum of the fraction of all-pairs shortest paths that pass through v.
- betweenness_centrality(normalized, endpoints)#
Compute the shortest-path betweenness centrality for vertices.
- Parameters:
normalized (bool) – whether the result should be normalized
endpoints (bool) – whether including the endpoints in the shortest path counts
Breadth-First Search#
Breadth-First Search (BFS) is an algorithm for traversing or searching graph data. It starts at a root vertex and explores all vertices at the present depth prior to moving on to the vertices at the next depth level.
- bfs(source)#
Compute a list of vertices in breadth-first search from source.
- Parameters:
source (int) – The source vertex for breadth-first search
CDLP#
Evaluate Community Detection with Label Propagation, Also known as LPA. See LPA
- cdlp(max_round=10)#
Evaluate Community Detection with Label Propagation
- Parameters:
max_round (int) – Maximum rounds, default to 10
Closeness Centrality#
The original closeness centrality of a vertex v is the reciprocal of the average shortest path distance to v over all n-1 reachable nodes. Wasserman and Faust proposed an improved formula for graphs with more than one connected component. The result is “a ratio of the fraction of actors in the group who are reachable, to the average distance” from the reachable actors.
- closeness_centrality(wf)#
Compute closeness centrality for vertices.
- Parameters:
wf (bool) – whether the Wasserman and Faust improved formula is used
Clustering#
The clustering algorithm is to compute the clustering coefficient for each vertex of a graph. The clustering coefficient of a vertex in a graph quantifies how close its neighbors are to being a clique (complete graph).
- clustering()#
Compute the clustering coefficient for each vertex in the graph.
Degree Assortativity Coefficient#
Similar to attribute assortativity, degree assortativity coefficient measures tendency of having an edge (u,v) such that, degree(u) equals to degree(v).
- degree_assortativity_coefficient(source_degree_type, target_degree_type, weighted)#
Compute degree assortativity coefficient of the graph.
- Parameters:
source_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for source vertex
target_degree_type (str) – use “in”-,“out”- or ”in+out”-degree for target vertex
weight (bool) – The edge attribute that holds the numerical value used as a weight. If
False
, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the vertex.
Degree Centrality#
The degree centrality for a vertex v is the fraction of nodes it is connected to. If the graph is directed, then three separate measures of degree centrality are defined, namely, in-degree, out-degree and both in- and out-degree.
- degree_centrality(centrality_type)#
Compute the degree centrality for vertices.
- Parameters:
centrality_type (str) –
in
,out
orboth
degree are applied
Depth-First Search#
Depth-First-Search (DFS) is an algorithm for traversing or searching graph data. It starts at a root vertex and explores as far as possible along each branch before backtracking.
- dfs(source)#
Compute a list of vertices in depth-first search from source.
- Parameters:
source (int) – The source vertex for depth-first search
Eigenvector Centrality#
Eigenvector centrality is a measure of the influence of a vertex in a graph. Relationships originating from high-scoring vertices contribute more to the score of a vertex than connections from low-scoring vertices. A high eigenvector score means that a vertex is connected to many vertices who themselves have high scores.
- eigenvector_centrality(tolerance, max_round)#
Compute the eigenvector centrality for the graph.
- Parameters:
tolerance (double) – error tolerance used to check convergence
max_round (int) – maximum number of iterations
Hyperlink-Induced Topic Search#
Hyperlink-Induced Topic Search (HITS) is an iterative algorithm to compute the importance of each vertices in a graph. It rates vertices based on two scores, a hub score and an authority score. The authority score estimates the importance of the vertex within the graph. The hub score estimates the value of its relationships to other vertices.
- hits(tolerance, max_round, normalized)#
Compute the HITS value for each vertex in the graph.
- Parameters:
tolerance (double) – error tolerance used to check convergence
max_round (int) – maximum number of iterations
normalized (bool) – whether we need to normalize the resulting values
Katz Centrality#
Katz centrality computes the relative influence of a vertex within a graph by measuring the number of the immediate neighbors (first degree vertices) and also all other vertices in the graph that connect to the vertex under consideration through these immediate neighbors.
- katz_centrality(alpha, beta, tolerance, max_round, normalized)#
Compute the Katz centrality for the vertices of the graph.
- Parameters:
alpha (double) – attenuation factor
beta (double) – weight attributed to the immediate neighborhood
tolerance (double) – error tolerance used to check convergence
max_round (int) – maximum number of iterations
normalized (bool) – whether we need to normalize the resulting values
K-Core#
This algorithm is to find a k-core from an original graph, and a k-core is a maximal subgraph that contains vertices of degree k or more. The k-core is found by recursively pruning vertices with degrees less than k.
- kkore(k)#
Compute the k-core of the graph.
- Parameters:
k (int) – The order of the core
K-Shell#
The k-shell is the subgraph induced by vertices with core number k. That is, vertices in the k-core that are not in the (k+1)-core.
- kshell(k)#
Compute the k-shell of the graph.
- Parameters:
k (int) – The order of the shell
Label Propagation Algorithm#
The Label Propagation Algorithm (LPA) is a fast algorithm for finding communities in a graph. It is an iterative algorithm where we assign labels to unlabelled vertices by propagating labels through the graph.
- lpa(max_round)#
Compute the label of each vertex in the graph.
- Parameters:
max_round (int) – the number of iterations during the computation
LCC#
The LCC is aims to compute the local clustering coefficient, follow the specification of LDBC
- lcc()#
Compute the local clustering coefficient for each vertex in the graph.
PageRank#
PageRank is a way of measuring the importance of each vertices in a graph. The PageRank algorithm exists in many variants, the PageRank in GAE follows the PageRank implementation of NetworkX.
- pagerank(alpha, max_round, tolerance)#
Compute the PageRank value for each vertex in the graph.
- Parameters:
alpha (double) – damping parameter for PageRank
max_round (int) – maximum number of iterations
tolerance (double) – error tolerance used to check convergence
Sampling Path#
This algorithm samples a set of paths starting from a root vertex with path length.
- sampling_path(source_id, cutoff)#
Compute a set of paths starting from a root vertex.
- Parameters:
source_id (int) – the source vertex of path sampling
cutoff (int) – maximum length of paths
Single-Source Shortest Paths#
The Single-Source Shortest Paths (SSSP) fixes a single vertex as the “source” vertex and finds shortest paths from the source to all other vertices in the graph.
- sssp(source)#
Compute shortest paths from a source vertex in the graph.
- Parameters:
source (int) – The source vertex for the shortest paths
VoteRank#
VoteRank is to measure a ranking of the vertices in a graph based on a voting scheme. With VoteRank, all vertices vote for each of its in-neighbors and the vertices with the top highest votes is elected iteratively.
- voterank(num_of_nodes)#
Select a list of influential vertices in a graph using VoteRank algorithm.
- Parameters:
num_of_nodes (int) – the number of ranked vertices to extract
WCC#
Compute the weakly connected component
- wcc()#
Compute the weakly connected component in the graph.