Analytical App
AppAssets
- class graphscope.framework.app.AppAssets(algo, context=None, gar=None)[source]
A class represents an app asset node in a DAG that holds the bytes of the gar resource.
Assets includes an algorithm name, and gar (for user defined algorithm), a context type (one of ‘tensor’, ‘vertex_data’, ‘vertex_property’, ‘labeled_vertex_data’, ‘dynamic_vertex_data’, ‘labeled_vertex_property’), and its type (one of cpp_pie, cython_pie, cython_pregel),
The instance of this class can be passed to init
graphscope.framework.app.AppDAGNode
- __init__(algo, context=None, gar=None)[source]
Init assets of the algorithm.
- Parameters
algo (str) – Represent specific algo inside resource.
context (str) – Type of context that hold the calculation results.
will get from gar if param is None. Defaults to None. (It) –
gar (bytes or BytesIO, optional) – The bytes that encodes the application’s source code. Defaults to None.
- property algo
Algorithm name, e.g. sssp, pagerank.
- Returns
Algorithm name of this asset.
- Return type
str
- property context_type
Context type, e.g. vertex_property, labeled_vertex_data.
- Returns
Type of the app context.
- Return type
str
- property gar
Gar resource.
- Returns
gar resource of this asset.
- Return type
bytes
- is_compatible(graph)[source]
Determine if this algorithm can run on this type of graph.
- Parameters
graph (
GraphDAGNode
) – A graph instance.- Raises
InvalidArgumentError –
App is not compatible with graph
ScannerError –
Yaml file format is incorrect.
- property signature
Generate a signature of the app assets by its algo name (and gar resources).
Used to uniquely identify a app assets.
- Returns
signature of this assets
- Return type
str
- property type
Algorithm type, one of cpp_pie, cython_pie, java_pie or cython_pregel.
- Returns
Algorithm type of this asset.
- Return type
str
JavaApp
- class graphscope.analytical.app.JavaApp(full_jar_path: str, java_app_class: str)[source]
A class represents a java app assert node in a DAG that holds the jar file.
It holds neccessary resouces to run a java app, including java class path, the gar file which consists jar and configuration yaml, and the specified java class. On creating a JavaApp, graphscope will try to load the specified java class, and parse the Base class for your app, and the base class for your Context Class. This operation requires a java runtime environment installed in your client machine where your graphscope session is created.
To run your app, provide JavaApp with a property or projected graph and your querying args.
- __call__(graph: graphscope.framework.graph.Graph, *args, **kwargs)[source]
Instantiate an App and do queries over it.
- __init__(full_jar_path: str, java_app_class: str)[source]
Init JavaApp with the full path of your jar file and the fully-qualified name of your app class.
- Parameters
full_jar_path (str) – The path where the jar file exists.
java_app_class (str) – the fully-qualified name of your app class.
App object
- class graphscope.framework.app.AppDAGNode(graph, app_assets: graphscope.framework.app.AppAssets)[source]
A class represents a app node in a DAG.
In GraphScope, an app node binding a concrete graph node that query executed on.
- class graphscope.framework.app.App(app_node, key)[source]
An application that can run on graphs and produce results.
Analytical engine will build the app dynamic library when instantiate a app instance. And the dynamic library will be reused if subsequent app’s signature matches one of previous ones.
- property key
A unique identifier of App.
- property signature
Signature is computed by all critical components of the App.
Functions
|
Load an app from gar. |
BuiltIn apps
- graphscope.bfs(graph, src=0)[source]
Breadth first search from the src on projected simple graph.
- Parameters
graph (
graphscope.Graph
) – A simple graph.src (optional) – Source vertex of breadth first search. The type should be consistent with the id type of the graph, that is, it’s int or str depending on the oid_type is int64_t or string of the graph. Defaults to 0.
- Returns
A context with each vertex with a distance from the source, will be evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.bfs(pg, src=6) >>> sess.close()
- graphscope.pagerank(graph, delta=0.85, max_round=10)[source]
Evalute PageRank on a graph.
- Parameters
graph (
graphscope.Graph
) – A simple graph.delta (float, optional) – Dumping factor. Defaults to 0.85.
max_round (int, optional) – Maximum number of rounds. Defaults to 10.
- Returns
A context with each vertex assigned with the pagerank value, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.pagerank(pg, delta=0.85, max_round=10) >>> sess.close()
- graphscope.sssp(graph, src=0, weight=None)[source]
Compute single source shortest path length on the graph.
Note that the sssp algorithm requires an numerical property on the edge.
- Parameters
graph (
graphscope.Graph
) – A simple graph.src (optional) – The source vertex. The type should be consistent with the id type of the graph, that is, it’s int or str depending on the oid_type is int64_t or string of the graph. Defaults to 0.
weight (str, optional) – The edge data key corresponding to the edge weight. Note that property under multiple labels should have the consistent index. Defaults to None.
- Returns
A context with each vertex assigned with the shortest distance from the src, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.sssp(pg, src=6) >>> sess.close()
- graphscope.wcc(graph)[source]
Evaluate weakly connected components on the graph.
- Parameters
graph (
graphscope.Graph
) – A simple graph.- Returns
A context with each vertex assigned with the component ID, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.wcc(pg) >>> sess.close()
- graphscope.avg_clustering(graph, degree_threshold=1000000000)[source]
Compute the average clustering coefficient for the directed graph.
- Parameters
graph (
graphscope.Graph
) – A simple graph.degree_threshold (int, optional) – Filter super vertex which degree is greater than threshold. Default to 1e9.
- Returns
- float
The average clustering coefficient.
- Return type
r
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.avg_clustering(pg) >>> print(c.to_numpy("r", axis=0)[0]) >>> sess.close()
- graphscope.clustering(graph, degree_threshold=1000000000)[source]
Local clustering coefficient of a node in a Graph is the fraction of pairs of the node’s neighbors that are adjacent to each other.
- Parameters
graph (
graphscope.Graph
) – A simple graph.degree_threshold (int, optional) – Filter super vertex which degree is greater than threshold. Default to 1e9.
- Returns
A context with each vertex assigned the computed clustering value, will be evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.clustering(pg) >>> sess.close()
- graphscope.degree_centrality(graph, centrality_type='both')[source]
The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph n-1 where n is the number of nodes in G.
- Parameters
graph (
Graph
) – A simple graph.centrality_type (str, optional) – Available options are in/out/both. Defaults to “both”.
- Returns
A context with each vertex assigned with the computed degree centrality, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.degree_centrality(pg, centrality_type="both") >>> sess.close()
- graphscope.eigenvector_centrality(graph, tolerance=1e-06, max_round=100, weight=None)[source]
Compute the eigenvector centrality for the graph. See more about eigenvector centrality here: https://networkx.org/documentation/networkx-1.10/reference/generated/networkx.algorithms.centrality.eigenvector_centrality.html
- Parameters
graph (
graphscope.Graph
) – A simple graph.tolerance (float, optional) – Defaults to 1e-06.
max_round (int, optional) – Defaults to 100.
weight (str, optional) – The edge data key corresponding to the edge weight. Note that property under multiple labels should have the consistent index. Defaults to None.
- Returns
A context with each vertex assigned with a gv-centrality, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.eigenvector_centrality(pg, tolerance=1e-06, max_round=10) >>> sess.close()
- graphscope.hits(graph, tolerance=0.01, max_round=100, normalized=True)[source]
Compute HITS on graph.
Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages. See more here: https://en.wikipedia.org/wiki/HITS_algorithm
- Parameters
graph (
graphscope.Graph
) – A simple graph.tolerance (float, optional) – Defaults to 0.01.
max_round (int, optional) – Defaults to 100.
normalized (bool, optional) – Whether to normalize the result to 0-1. Defaults to True.
- Returns
A context with each vertex assigned with the HITS value, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.hits(pg, tolerance=0.01, max_round=10, normalized=True) >>> sess.close()
- graphscope.k_core(graph, k: int)[source]
K-cores of the graph are connected components that are left after all vertices of degree less than k have been removed.
- Parameters
graph (
graphscope.Graph
) – A simple graph.k (int) – The order of the core.
- Returns
- A context with each vertex assigned with a boolean:
1 if the vertex satisfies k-core, otherwise 0.
Evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.k_core(pg, k=3) >>> sess.close()
- graphscope.katz_centrality(graph, alpha=0.1, beta=1.0, tolerance=1e-06, max_round=100, normalized=True, degree_threshold=1000000000.0)[source]
Compute the Katz centrality.
See more details for Katz centrality here: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.katz_centrality_numpy.html
- Parameters
graph (
graphscope.Graph
) – A simple graph.alpha (float, optional) – Auttenuation factor. Defaults to 0.1.
beta (float, optional) – Weight attributed to the immediate neighborhood. Defaults to 1.0.
tolerance (float, optional) – Error tolerance. Defaults to 1e-06.
max_round (int, optional) – Maximun number of rounds. Defaults to 100.
normalized (bool, optional) – Whether to normalize result values. Defaults to True.
degree_threshold (int, optional) – Filter super vertex which degree is greater than threshold. Default to 1e9.
- Returns
A context with each vertex assigned with the computed katz_centrality, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.katz_centrality(pg) >>> sess.close()
- graphscope.lpa(graph, max_round=10)[source]
Evaluate Community Detection with Label Propagation.
- Parameters
graph (
graphscope.Graph
) – A simple graph.max_round (int, optional) – Maximum rounds. Defaults to 10.
- Returns
A context with each vertex assigned with a community ID, will be evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.lpa(pg, max_round=10) >>> sess.close()
- graphscope.triangles(graph)[source]
Evaluate triangle counting of the graph G.
- Parameters
graph (
graphscope.Graph
) – A simple graph.- Returns
A context with each vertex assigned with the triangle counting result, evaluated in eager mode.
- Return type
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.triangles(pg) >>> sess.close()
- graphscope.louvain(graph, min_progress=1000, progress_tries=1)[source]
Compute best partition on the graph by louvain.
- Parameters
graph (
graphscope.Graph
) – A simple undirected graph.min_progress – The minimum delta X required to be considered progress, where X is the number of nodes that have changed their community on a particular pass. Delta X is then the difference in number of nodes that changed communities on the current pass compared to the previous pass.
progress_tries – number of times the min_progress setting is not met before exiting form the current level and compressing the graph.
- Returns
A context with each vertex assigned with id of community it belongs to, evaluated in eager mode.
- Return type
References
[1] Blondel, V.D. et al. Fast unfolding of communities in large networks. J. Stat. Mech 10008, 1-12(2008).
[2] https://github.com/Sotera/distributed-graph-analytics
[3] https://sotera.github.io/distributed-graph-analytics/louvain/
Notes
louvain now only support undirected graph. If input graph is directed graph, louvain would raise an InvalidArgumentError.
Examples:
>>> import graphscope >>> from graphscope.dataset import load_p2p_network >>> sess = graphscope.session(cluster_type="hosts", mode="eager") >>> g = load_p2p_network(sess, directed=False) >>> # project to a simple graph (if needed) >>> pg = g.project(vertices={"host": ["id"]}, edges={"connect": ["dist"]}) >>> c = graphscope.louvain(pg, min_progress=1000, progress_tries=1) >>> sess.close()