Graph#

Graph object#

class graphscope.framework.graph.GraphDAGNode(session, incoming_data=None, oid_type='int64', vid_type='uint64', directed=True, generate_eid=True, retain_oid=True, vertex_map: str | int = 'global', compact_edges=False, use_perfect_hash=False)[source]#

A class represents a graph node in a DAG.

In GraphScope, all operations that generate a new graph will return a instance of GraphDAGNode, which will be automatically executed by Session.run() in eager mode.

The following example demonstrates its usage:

>>> # lazy mode
>>> import graphscope as gs
>>> sess = gs.session(mode="lazy")
>>> g = sess.g()
>>> g1 = g.add_vertices("person.csv","person")
>>> print(g1) # <graphscope.framework.graph.GraphDAGNode object>
>>> g2 = sess.run(g1)
>>> print(g2) # <graphscope.framework.graph.Graph object>

>>> # eager mode
>>> import graphscope as gs
>>> sess = gs.session(mode="eager")
>>> g = sess.g()
>>> g1 = g.add_vertices("person.csv","person")
>>> print(g1) # <graphscope.framework.graph.Graph object>
>>> del g1
__init__(session, incoming_data=None, oid_type='int64', vid_type='uint64', directed=True, generate_eid=True, retain_oid=True, vertex_map: str | int = 'global', compact_edges=False, use_perfect_hash=False)[source]#

Construct a GraphDAGNode object.

Parameters:
  • session (Session) – A graphscope session instance.

  • incoming_data

    Graph can be initialized through various type of sources, which can be one of:

  • oid_type – (str, optional): Type of vertex original id. Defaults to “int64”.

  • vid_type – (str, optional): Type of vertex internal id. Defaults to “uint64”.

  • directed – (bool, optional): Directed graph or not. Defaults to True.

  • generate_eid – (bool, optional): Generate id for each edge when set True. Defaults to True.

  • retain_oid – (bool, optional): Keep original ID in vertex table when set True. Defaults to True.

  • vertex_map (str, optional) – Indicate use global vertex map or local vertex map. Can be “global” or “local”. Defaults to global.

  • compact_edges (bool, optional) – Compact edges (CSR) using varint and delta encoding. Defaults to False. Note that compact edges helps to half the memory usage of edges in graph data structure, but may cause at most 10%~20% performance degeneration in some algorithms. Defaults to False.

  • use_perfect_hash (bool, optional) – Use perfect hash in vertex map to optimize the memory usage. Defaults to False.

add_column(results, selector)[source]#

Add the results as a column to the graph. Modification rules are given by the selector.

Parameters:
Returns:

A new graph with new columns, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

add_edges(edges, label='_e', properties=None, src_label=None, dst_label=None, src_field: int | str = 0, dst_field: int | str = 1)[source]#

Add edges to the graph, and return a new graph. Here the src_label and dst_label must be both specified or both unspecified,

  1. src_label and dst_label both unspecified and current graph has no vertex label.

    We deduce vertex label from edge table, and set vertex label name to ‘_’.

  2. src_label and dst_label both unspecified and current graph has one vertex label.

    We set src_label and dst label to this single vertex label.

  1. src_label and dst_label both specified and existed in current graph’s vertex labels.

  2. src_label and dst_label both specified and some are not existed in current graph’s vertex labels.

We deduce missing vertex labels from edge tables.

Parameters:
  • edges (Union[str, Loader]) – Edge data source.

  • label (str, optional) – Edge label name. Defaults to “_e”.

  • properties (list[str], optional) – List of column names loaded as properties. Defaults to None.

  • src_label (str, optional) – Source vertex label. Defaults to None.

  • dst_label (str, optional) – Destination vertex label. Defaults to None.

  • src_field (int, optional) – Column index or name used as src field. Defaults to 0.

  • dst_field (int, optional) – Column index or name used as dst field. Defaults to 1.

Raises:

ValueError – If the given value is invalid or conflict with current graph.

Returns:

A new graph with edge added, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

add_vertices(vertices, label='_', properties=None, vid_field: int | str = 0)[source]#

Add vertices to the graph, and return a new graph.

Parameters:
  • vertices (Union[str, Loader]) – Vertex data source.

  • label (str, optional) – Vertex label name. Defaults to “_”.

  • properties (list[str], optional) – List of column names loaded as properties. Defaults to None.

  • vid_field (int or str, optional) – Column index or property name used as id field. Defaults to 0.

Raises:

ValueError – If the given value is invalid or conflict with current graph.

Returns:

A new graph with vertex added, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

consolidate_columns(label: str, columns: List[str] | Tuple[str], result_column: str)[source]#

Consolidate columns of given vertex / edge properties (of same type) into one column.

For example, if we have a graph with vertex label “person”, and edge labels “knows” and “follows”, and we want to consolidate the “weight0”, “weight1” properties of the vertex and both edges into a new column “weight”, we can do:

>>> g = ...
>>> g = g.consolidate_columns("person", ["weight0", "weight1"], "weight")
>>> g = g.consolidate_columns("knows", ["weight0", "weight1"], "weight")
>>> g = g.consolidate_columns("follows", ["weight0", "weight1"], "weight")
Parameters:
  • label – the label of the vertex or edge.

  • columns (dict) – the properties of given vertex or edge to be consolidated.

  • result_column – the name of the new column.

Returns:

A new graph with column consolidated, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

project(vertices: Mapping[str, List[str] | None], edges: Mapping[str, List[str] | None])[source]#

Project a subgraph from the property graph, and return a new graph. A graph produced by project just like a normal property graph, and can be projected further.

Parameters:
  • vertices (dict) – key is the vertex label name, the value is a list of str, which represents the name of properties. Specifically, it will select all properties if value is None. Note that, the label of the vertex in all edges you want to project should be included.

  • edges (dict) – key is the edge label name, the value is a list of str, which represents the name of properties. Specifically, it will select all properties if value is None.

Returns:

A new graph projected from the property graph, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

class graphscope.framework.graph.Graph(graph_node)[source]#

A class for representing metadata of a graph in the GraphScope.

A Graph object holds the metadata of a graph, such as key, schema, and the graph is directed or not.

It is worth noticing that the graph is stored by the backend such as Analytical Engine, Vineyard. In other words, the graph object holds nothing but metadata.

The following example demonstrates its usage:

>>> import graphscope as gs
>>> sess = gs.session()
>>> graph = sess.g()
>>> graph = graph.add_vertices("person.csv", "person")
>>> graph = graph.add_vertices("software.csv", "software")
>>> graph = graph.add_edges("knows.csv", "knows", src_label="person", dst_label="person")
>>> graph = graph.add_edges("created.csv", "created", src_label="person", dst_label="software")
>>> print(graph)
>>> print(graph.schema)
__init__(graph_node)[source]#

Construct a Graph object.

add_column(results, selector)[source]#

Add the results as a column to the graph. Modification rules are given by the selector.

Parameters:
Returns:

A new graph with new columns, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

add_edges(edges, label='_', properties=None, src_label=None, dst_label=None, src_field: int | str = 0, dst_field: int | str = 1) Graph | GraphDAGNode[source]#

Add edges to the graph, and return a new graph. Here the src_label and dst_label must be both specified or both unspecified,

  1. src_label and dst_label both unspecified and current graph has no vertex label.

    We deduce vertex label from edge table, and set vertex label name to ‘_’.

  2. src_label and dst_label both unspecified and current graph has one vertex label.

    We set src_label and dst label to this single vertex label.

  1. src_label and dst_label both specified and existed in current graph’s vertex labels.

  2. src_label and dst_label both specified and some are not existed in current graph’s vertex labels.

We deduce missing vertex labels from edge tables.

Parameters:
  • edges (Union[str, Loader]) – Edge data source.

  • label (str, optional) – Edge label name. Defaults to “_e”.

  • properties (list[str], optional) – List of column names loaded as properties. Defaults to None.

  • src_label (str, optional) – Source vertex label. Defaults to None.

  • dst_label (str, optional) – Destination vertex label. Defaults to None.

  • src_field (int, optional) – Column index or name used as src field. Defaults to 0.

  • dst_field (int, optional) – Column index or name used as dst field. Defaults to 1.

Raises:

ValueError – If the given value is invalid or conflict with current graph.

Returns:

A new graph with edge added, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

add_vertices(vertices, label='_', properties=None, vid_field: int | str = 0) Graph | GraphDAGNode[source]#

Add vertices to the graph, and return a new graph.

Parameters:
  • vertices (Union[str, Loader]) – Vertex data source.

  • label (str, optional) – Vertex label name. Defaults to “_”.

  • properties (list[str], optional) – List of column names loaded as properties. Defaults to None.

  • vid_field (int or str, optional) – Column index or property name used as id field. Defaults to 0.

Raises:

ValueError – If the given value is invalid or conflict with current graph.

Returns:

A new graph with vertex added, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

archive(path)[source]#

Archive graph gar format files base on the graph info. The meta and data of graph is dumped to specified location, and can be restored by Graph.deserialize in other sessions.

Parameters:

path (str) – the graph info file path.

consolidate_columns(label: str, columns: List[str] | Tuple[str], result_column: str) Graph | GraphDAGNode[source]#

Consolidate columns of given vertex / edge properties (of same type) into one column.

For example, if we have a graph with vertex label “person”, and edge labels “knows” and “follows”, and we want to consolidate the “weight0”, “weight1” properties of the vertex and both edges into a new column “weight”, we can do:

>>> g = ...
>>> g = g.consolidate_columns("person", ["weight0", "weight1"], "weight")
>>> g = g.consolidate_columns("knows", ["weight0", "weight1"], "weight")
>>> g = g.consolidate_columns("follows", ["weight0", "weight1"], "weight")
Parameters:
  • label – the label of the vertex or edge.

  • columns (dict) – the properties of given vertex or edge to be consolidated.

  • result_column – the name of the new column.

Returns:

A new graph with column consolidated, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

detach()[source]#

Detaching a graph makes it being left in vineyard even when the varaible for this Graph object leaves the lexical scope.

The graph can be accessed using the graph’s ObjectID or its name later.

property key#

The key of the corresponding graph in engine.

classmethod load_from(path, sess, **kwargs)[source]#

Construct a Graph by deserialize from path. It will read all serialization files, which is dumped by Graph.serialize. If any serialize file doesn’t exists or broken, will error out.

Parameters:
  • path (str) – Path contains the serialization files.

  • sess (graphscope.Session) – The target session that the graph will be construct in

Returns:

A new graph object. Schema and data is supposed to be

identical with the one that called serialized method.

Return type:

Graph

loaded()[source]#

True if current graph has been loaded in the session.

project(vertices: Mapping[str, List[str] | None], edges: Mapping[str, List[str] | None]) Graph | GraphDAGNode[source]#

Project a subgraph from the property graph, and return a new graph. A graph produced by project just like a normal property graph, and can be projected further.

Parameters:
  • vertices (dict) – key is the vertex label name, the value is a list of str, which represents the name of properties. Specifically, it will select all properties if value is None. Note that, the label of the vertex in all edges you want to project should be included.

  • edges (dict) – key is the edge label name, the value is a list of str, which represents the name of properties. Specifically, it will select all properties if value is None.

Returns:

A new graph projected from the property graph, evaluated in eager mode.

Return type:

graphscope.framework.graph.GraphDAGNode

save_to(path, **kwargs)[source]#

Serialize graph to a location. The meta and data of graph is dumped to specified location, and can be restored by Graph.load_from in other sessions.

Each worker will write a path_{worker_id}.meta file and a path_{worker_id} file to storage. :param path: supported storages are local, hdfs, oss, s3 :type path: str

property schema#

Schema of the graph.

Returns:

the schema of the graph

Return type:

GraphSchema

property schema_path#

Path that Coordinator will write interactive schema path to.

Returns:

The path contains the schema. for interactive engine.

Return type:

str

property session_id#

Get the currrent session_id.

Returns:

Return session id that the graph belongs to.

Return type:

str

to_dataframe(selector, vertex_range=None)[source]#

Select some elements of the graph and output as a pandas.DataFrame

Parameters:
  • selector (dict) – Select some portions of graph.

  • vertex_range (dict, optional) – Slice vertices. Defaults to None.

Returns:

pandas.DataFrame

to_directed()[source]#

Returns a directed representation of the graph.

Returns:

A directed graph with the same name, same nodes, and

with each edge (u, v, data) replaced by two directed edges (u, v, data) and (v, u, data).

Return type:

Graph

to_numpy(selector, vertex_range=None)[source]#

Select some elements of the graph and output to numpy.

Parameters:
  • selector (str) – Select a portion of graph as a numpy.ndarray.

  • vertex_range (dict, optional) – Slice vertices. Defaults to None.

Returns:

numpy.ndarray

to_undirected()[source]#

Returns an undirected representation of the digraph.

Returns:

An undirected graph with the same name and nodes and

with edge (u, v, data) if either (u, v, data) or (v, u, data) is in the digraph. If both edges exist in digraph, they will both be preserved. You must check and correct for this manually if desired.

Return type:

Graph

property vineyard_id#

Get the vineyard object_id of this graph.

Returns:

return vineyard id of this graph

Return type:

str

Loader object#

class graphscope.framework.loader.Loader(source, delimiter=',', sep=',', header_row=True, filetype=None, **kwargs)[source]#

Generic data source wrapper. Loader can take various data sources, and assemble necessary information into a AttrValue.

__init__(source, delimiter=',', sep=',', header_row=True, filetype=None, **kwargs)[source]#

Initialize a loader with configurable options. Note: Loader cannot be reused since it may change inner state when constructing information for loading a graph.

Parameters:
  • source (str or value) –

    The data source to be load, which could be one of the followings:

    • local file: specified by URL file://...

    • oss file: specified by URL oss://...

    • hdfs file: specified by URL hdfs://...

    • s3 file: specified by URL s3://...

    • numpy ndarray, in CSR format

    • pandas dataframe

    Ordinary data sources can be loaded using vineyard stream as well, a vineyard:// prefix can be used in the URL then the local file, oss object or HDFS file will be loaded into a vineyard stream first, then GraphScope’s fragment will be built upon those streams in vineyard.

    Once the stream IO in vineyard reaches a stable state, it will be the default mode to load data sources and construct fragments in GraphScope.

  • delimiter (char, optional) – Column delimiter. Defaults to ‘,’

  • header_row (bool, optional) – Whether source have a header. If true, column names will be read from the first row of source, else they are named by ‘f0’, ‘f1’, …. Defaults to True.

  • filetype (str, optional) – Specify the type of files to load, can be “CSV”, “ORC”, and “PARQUET”. Default is “CSV”.

Notes

Data is resolved by drivers in vineyard . See more additional info in Loading Graph section of Docs, and implementations in vineyard.

Graph Functions#

graphscope.framework.graph_builder.load_from(edges: Mapping[str, Loader | str | Sequence[ndarray] | DataFrame | Object | ObjectID | ObjectName | Sequence | Mapping] | Loader | str | Sequence[ndarray] | DataFrame | Object | ObjectID | ObjectName | Sequence, vertices: Mapping[str, Loader | str | Sequence[ndarray] | DataFrame | Object | ObjectID | ObjectName | Sequence | Mapping] | Loader | str | Sequence[ndarray] | DataFrame | Object | ObjectID | ObjectName | Sequence | None = None, directed=True, oid_type='int64_t', vid_type='uint64_t', generate_eid=True, retain_oid=True, vformat=None, eformat=None, vertex_map='global', compact_edges=False, use_perfect_hash=False) Graph[source]#

Load a Arrow property graph using a list of vertex/edge specifications.

Deprecated since version version: 0.3 Use graphscope.Graph() instead.

  • Use Dict of tuples to setup a graph.

    We can use a dict to set vertex and edge configurations, which can be used to build graphs.

    Examples:

    g = graphscope_session.load_from(
        edges={
            "group": [
                (
                    "file:///home/admin/group.e",
                    ["group_id", "member_size"],
                    ("leader_student_id", "student"),
                    ("member_student_id", "student"),
                ),
                (
                    "file:///home/admin/group_for_teacher_student.e",
                    ["group_id", "group_name", "establish_date"],
                    ("teacher_in_charge_id", "teacher"),
                    ("member_student_id", "student"),
                ),
            ]
        },
        vertices={
            "student": (
                "file:///home/admin/student.v",
                ["name", "lesson_nums", "avg_score"],
                "student_id",
            ),
            "teacher": (
                "file:///home/admin/teacher.v",
                ["name", "salary", "age"],
                "teacher_id",
            ),
        },
    )
    

    ‘e’ is the label of edges, and ‘v’ is the label for vertices, edges are stored in the ‘both_in_out’ format edges with label ‘e’ linking from ‘v’ to ‘v’.

  • Use Dict of dict to setup a graph.

    We can also give each element inside the tuple a meaningful name, makes it more understandable.

    Examples:

    g = graphscope_session.load_from(
        edges={
            "group": [
                {
                    "loader": "file:///home/admin/group.e",
                    "properties": ["group_id", "member_size"],
                    "source": ("leader_student_id", "student"),
                    "destination": ("member_student_id", "student"),
                },
                {
                    "loader": "file:///home/admin/group_for_teacher_student.e",
                    "properties": ["group_id", "group_name", "establish_date"],
                    "source": ("teacher_in_charge_id", "teacher"),
                    "destination": ("member_student_id", "student"),
                },
            ]
        },
        vertices={
            "student": {
                "loader": "file:///home/admin/student.v",
                "properties": ["name", "lesson_nums", "avg_score"],
                "vid": "student_id",
            },
            "teacher": {
                "loader": "file:///home/admin/teacher.v",
                "properties": ["name", "salary", "age"],
                "vid": "teacher_id",
            },
        },
    )
    
Parameters:
  • edges – Edge configuration of the graph

  • vertices (optional) – Vertices configurations of the graph. Defaults to None. If None, we assume all edge’s src_label and dst_label are deduced and unambiguous.

  • directed (bool, optional) – Indicate whether the graph should be treated as directed or undirected.

  • oid_type (str, optional) – ID type of graph. Can be “int32_t”, “int64_t” or “string”. Defaults to “int64_t”.

  • vid_type (str, optional) – Internal vertex ID type of graph. Can be “uint32_t” and “uint64_t”. Defaults to “uint64_t”.

  • generate_eid (bool, optional) – Whether to generate a unique edge id for each edge. Generated eid will be placed in third column. This feature is for cooperating with interactive engine. If you only need to work with analytical engine, set it to False. Defaults to True.

  • retain_oid (bool, optional) – Whether to keep the orignal ID column as the last column of vertex table. This feature is for cooperating with interactive engine. If you only need to work with analytical engine, set it to False. Defaults to True.

  • vertex_map (str, optional) – Indicate use global vertex map or local vertex map. Can be “global” or “local”.

  • compact_edges (bool, optional) – Compact edges (CSR) using varint and delta encoding. Defaults to False. Note that compact edges helps to half the memory usage of edges in graph data structure, but may cause at most 10%~20% performance degeneration in some algorithms.

  • use_perfect_hash (bool, optional) – Use perfect hashmap in vertex map to optimize the memory usage. Defaults to False.