Graph¶
Graph object¶
-
class
graphscope.framework.graph.
Graph
(session_id, incoming_data=None)[source]¶ A class for representing metadata of a graph in the GraphScope.
A
Graph
object holds the metadata of a graph, such as key, schema, and the graph is directed or not.It is worth noting that the graph is stored by the backend such as Analytical Engine, Vineyard. In other words, the graph object holds nothing but metadata.
The graph object should not be created directly from
Graph
. Instead, the graph should be created by Session.load_fromThe following example demonstrates its usage:
>>> import graphscope as gs >>> from graphscope.framework.loader import Loader >>> sess = gs.session() >>> g = sess.load_from( ... edges={ ... "knows": ( ... Loader("{}/p2p-31_property_e_0".format(property_dir), header_row=True), ... ["src_label_id", "dst_label_id", "dist"], ... ("src_id", "person"), ... ("dst_id", "person"), ... ), ... }, ... vertices={ ... "person": Loader( ... "{}/p2p-31_property_v_0".format(property_dir), header_row=True ... ), ... } ... )
-
__init__
(session_id, incoming_data=None)[source]¶ Construct a
Graph
object.- Parameters
session_id (str) – Session id of the session the graph is created in.
incoming_data –
Graph can be initialized through various type of sources, which can be one of:
GraphDef
nx.Graph
VineyardObject
-
add_column
(results, selector)[source]¶ Add the results as a column to the graph. Modification rules are given by the selector.
- Parameters
results (
Context
) – A Context that created by doing a query.selector (dict) – Select results to add as column. Format is similar to selectors in Context
- Returns
A new Graph with new columns.
- Return type
-
attach_interactive_instance
(instance)[source]¶ Store the instance when a new interactive instance is started.
- Parameters
instance – interactive instance
-
attach_learning_instance
(instance)[source]¶ Store the instance when a new learning instance is created.
- Parameters
instance – learning instance
-
detach
()[source]¶ Detaching a graph makes it being left in vineyard even when the varaible for this
Graph
object leaves the lexical scope.The graph can be accessed using the graph’s
ObjectID
or its name later.
-
property
graph_type
¶ The type of the graph object.
- Returns
the type of the graph.
- Return type
type (types_pb2.GraphType)
-
property
key
¶ The key of the corresponding graph in engine.
-
property
op
¶ The DAG op of this graph.
-
project_to_simple
(v_label='_', e_label='_', v_prop=None, e_prop=None)[source]¶ Project a property graph to a simple graph, useful for analytical engine. Will translate name represented label or property to index, which is broadedly used in internal engine.
- Parameters
v_label (str, optional) – vertex label to project. Defaults to “_”.
e_label (str, optional) – edge label to project. Defaults to “_”.
v_prop (str, optional) – vertex property of the v_label. Defaults to None.
e_prop (str, optional) – edge property of the e_label. Defaults to None.
- Returns
A Graph instance, which graph_type is ARROW_PROJECTED
- Return type
-
property
schema
¶ Schema of the graph.
- Returns
the schema of the graph
- Return type
GraphSchema
-
property
schema_path
¶ Path that Coordinator will write interactive schema path to.
- Returns
The path contains the schema. for interactive engine.
- Return type
str
-
property
session_id
¶ Get the currrent session_id.
- Returns
Return session id that the graph belongs to.
- Return type
str
-
to_dataframe
(selector, vertex_range=None)[source]¶ Select some elements of the graph and output as a pandas.DataFrame
- Parameters
selector (dict) – Select some portions of graph.
vertex_range (dict, optional) – Slice vertices. Defaults to None.
- Returns
pandas.DataFrame
-
to_numpy
(selector, vertex_range=None)[source]¶ Select some elements of the graph and output to numpy.
- Parameters
selector (str) – Select a portion of graph as a numpy.ndarray.
vertex_range (dict, optional) – Slice vertices. Defaults to None.
- Returns
numpy.ndarray
-
property
vineyard_id
¶ Get the vineyard object_id of this graph.
- Returns
return vineyard id of this graph
- Return type
str
-
Loader object¶
-
class
graphscope.framework.loader.
Loader
(source, delimiter=',', header_row=True, **kwargs)[source]¶ Generic data source wrapper. Loader can take various data sources, and assemble necessary information into a AttrValue.
-
__init__
(source, delimiter=',', header_row=True, **kwargs)[source]¶ Initialize a loader with configurable options. Note: Loader cannot be reused since it may change inner state when constructing information for loading a graph. :param source:
The data source to be load, which could be one of the followings:
local file: specified by URL
file://...
oss file: specified by URL
oss://...
hdfs file: specified by URL
hdfs://...
s3 file: specified by URL
s3://...
numpy ndarray, in CSR format
pandas dataframe
Ordinary data sources can be loaded using vineyard stream as well, a
vineyard://
prefix can be used in the URL then the local file, oss object or HDFS file will be loaded into a vineyard stream first, then GraphScope’s fragment will be built upon those streams in vineyard.Once the stream IO in vineyard reaches a stable state, it will be the default mode to load data sources and construct fragments in GraphScope.
- Parameters
delimiter (char, optional) – Column delimiter. Defaults to ‘,’
header_row (bool, optional) – Whether source have a header. If true, column names will be read from the first row of source, else they are named by ‘f0’, ‘f1’, …. Defaults to True.
Notes
Data is resolved by drivers in libvineyard . See more additional info in Loading Graph section of Docs, and implementations in libvineyard.
-
Graph Functions¶
-
graphscope.framework.graph_utils.
load_from
(edges: Union[Mapping[str, Union[Sequence, graphscope.framework.loader.Loader, str, Sequence[numpy.ndarray], pandas.core.frame.DataFrame, graphscope.framework.vineyard_object.VineyardObject, Mapping]], graphscope.framework.loader.Loader, str, Sequence[numpy.ndarray], pandas.core.frame.DataFrame, graphscope.framework.vineyard_object.VineyardObject, Sequence], vertices: Optional[Union[Mapping[str, Union[Sequence, graphscope.framework.loader.Loader, str, Sequence[numpy.ndarray], pandas.core.frame.DataFrame, graphscope.framework.vineyard_object.VineyardObject, Mapping]], graphscope.framework.loader.Loader, str, Sequence[numpy.ndarray], pandas.core.frame.DataFrame, graphscope.framework.vineyard_object.VineyardObject, Sequence]] = None, directed=True, oid_type='int64_t', generate_eid=True) → graphscope.framework.graph.Graph[source]¶ Load a Arrow property graph using a list of vertex/edge specifications.
- Use Dict of tuples to setup a graph.
We can use a dict to set vertex and edge configurations, which can be used to build graphs.
Examples:
g = graphscope_session.load_from( edges={ "group": [ ( "file:///home/admin/group.e", ["group_id", "member_size"], ("leader_student_id", "student"), ("member_student_id", "student"), ), ( "file:///home/admin/group_for_teacher_student.e", ["group_id", "group_name", "establish_date"], ("teacher_in_charge_id", "teacher"), ("member_student_id", "student"), ), ] }, vertices={ "student": ( "file:///home/admin/student.v", ["name", "lesson_nums", "avg_score"], "student_id", ), "teacher": ( "file:///home/admin/teacher.v", ["name", "salary", "age"], "teacher_id", ), }, )
‘e’ is the label of edges, and ‘v’ is the label for vertices, edges are stored in the ‘both_in_out’ format edges with label ‘e’ linking from ‘v’ to ‘v’.
- Use Dict of dict to setup a graph.
We can also give each element inside the tuple a meaningful name, makes it more understandable.
Examples:
g = graphscope_session.load_from( edges={ "group": [ { "loader": "file:///home/admin/group.e", "properties": ["group_id", "member_size"], "source": ("leader_student_id", "student"), "destination": ("member_student_id", "student"), }, { "loader": "file:///home/admin/group_for_teacher_student.e", "properties": ["group_id", "group_name", "establish_date"], "source": ("teacher_in_charge_id", "teacher"), "destination": ("member_student_id", "student"), }, ] }, vertices={ "student": { "loader": "file:///home/admin/student.v", "properties": ["name", "lesson_nums", "avg_score"], "vid": "student_id", }, "teacher": { "loader": "file:///home/admin/teacher.v", "properties": ["name", "salary", "age"], "vid": "teacher_id", }, }, )
- Parameters
edges – Edge configuration of the graph
vertices (optional) – Vertices configurations of the graph. Defaults to None. If None, we assume all edge’s src_label and dst_label are deduced and unambiguous.
directed (bool, optional) – Indicate whether the graph should be treated as directed or undirected.
oid_type (str, optional) – ID type of graph. Can be “int64_t” or “string”. Defaults to “int64_t”.
generate_eid (bool, optional) – Whether to generate a unique edge id for each edge. Generated eid will be placed in third column. This feature is for cooperating with interactive engine. If you only need to work with analytical engine, set it to False. Defaults to False.