In-memory immutable graphs on Vineyard#
Vineyard is a distributed immutable in-memory data manager that is used as the storage backend for immutable graphs in GraphScope. Vineyard provides zero-copy data sharing using memory mapping, and different compute engines in GraphScope can run on the same vineyard cluster to efficiently share the graph data.
Graphs in Vineyard#
Vineyard supports immutable property graphs and abstracts it as the vineyard::ArrowFragment
class, which consists of a CSR for edges and uses tables to store edge and vertex properties.
Upon the ArrowFragment
, vineyard abstracts distributed graph as vineyard::ArrowFragmentGroup
which consists of a set of fragments that spread across the cluster.
Loading Graphs to Vineyard#
Vineyard can be deployed as a standalone service or launched along with GraphScope.
A command-line tool vineyard-graph-loader
is provided to load fragments into
vineyard. It first accepts an optional argument --socket <vineyard-ipc-socket>
,
which points the IPC docket that the loader will connect to. If omitted, the value
will be resolved from the environment variable VINEYARD_IPC_SOCKET
. It takes
either a set of command-line arguments or a JSON file as configuration.
$ vineyard-graph-loader --help
Usage: loading vertices and edges as vineyard graph.
- ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] \
<e_label_num> <efiles...> <v_label_num> <vfiles...> \
[directed] [generate_eid] [retain_oid] [string_oid]
- or: ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] --config <config.json>
The config is a json file and should look like
{
"vertices": [
{
"data_path": "....",
"label": "...",
"options": "...."
},
...
],
"edges": [
{
"data_path": "",
"label": "",
"src_label": "",
"dst_label": "",
"options": ""
},
...
],
"directed": 1, # 0 or 1
"generate_eid": 1, # 0 or 1
"retain_oid": 1, # 0 or 1
"string_oid": 0, # 0 or 1
"local_vertex_map": 0 # 0 or 1
}%
Some of the options that specify how the graph will be constructed are:
directed
: whether the graph is a directed graph or undirected graph.generate_eid
: whether to generate a globally unique edge id for each edge.retain_oid
: whether to retain the original vertex id into the final vertex’s property table.string_oid
: whether the vertex id is a string.local_vertex_map
: whether to use local vertex map during the graph construction, which is usually used for optimizing the memory usage.
Using the vineyard-graph-loader
to load the modern graph can be done in the following ways:
using command line arguments
The
vineyard-graph-loader
accepts a sequence of command line arguments to specify the edge files and vertex files, e.g.,$ ./vineyard-graph-loader 2 "modern_graph/knows.csv#header_row=true&src_label=person&dst_label=person&label=knows&delimiter=|" \ "modern_graph/created.csv#header_row=true&src_label=person&dst_label=software&label=created&delimiter=|" \ 2 "modern_graph/person.csv#header_row=true&label=person&delimiter=|" \ "modern_graph/software.csv#header_row=true&label=software&delimiter=|"
using a JSON configuration file
$ ./vineyard-graph-loader --config config.json
The JSON configuration file could be (using the “modern graph” as an example):
{ "vertices": [ { "data_path": "modern_graph/person.csv", "label": "person", "options": "header_row=true&delimiter=|" }, { "data_path": "modern_graph/software.csv", "label": "software", "options": "header_row=true&delimiter=|" } ], "edges": [ { "data_path": "modern_graph/knows.csv", "label": "knows", "src_label": "person", "dst_label": "person", "options": "header_row=true&delimiter=|" }, { "data_path": "modern_graph/created.csv", "label": "created", "src_label": "person", "dst_label": "software", "options": "header_row=true&delimiter=|" } ], "directed": 1, "generate_eid": 1, "string_oid": 0, "local_vertex_map": 0 }
Using Loaded Graphs#
After being loaded into vineyard, the loaded fragment can be accessed using vineyard’s IPCClient:
void WriteOut(vineyard::Client& client, const grape::CommSpec& comm_spec,
vineyard::ObjectID fragment_group_id) {
LOG(INFO) << "Loaded graph to vineyard: " << fragment_group_id;
std::shared_ptr<vineyard::ArrowFragmentGroup> fg =
std::dynamic_pointer_cast<vineyard::ArrowFragmentGroup>(
client.GetObject(fragment_group_id));
for (const auto& pair : fg->Fragments()) {
LOG(INFO) << "[frag-" << pair.first << "]: " << pair.second;
}
// NB: only retrieve local fragments.
auto locations = fg->FragmentLocations();
for (const auto& pair : fg->Fragments()) {
if (locations.at(pair.first) != client.instance_id()) {
continue;
}
auto frag_id = pair.second;
Traverse(client, frag_id);
}
}
The local fragment can be traversed using the vineyard::ArrowFragment
’s API:
void Traverse(vineyard::Client& client, vineyard::ObjectID frag_id) {
auto frag = std::dynamic_pointer_cast<GraphType>(client.GetObject(frag_id));
LOG(INFO) << "graph total node number: " << frag->GetTotalNodesNum();
LOG(INFO) << "fragment edge number: " << frag->GetEdgeNum();
LOG(INFO) << "fragment in edge number: " << frag->GetInEdgeNum();
LOG(INFO) << "fragment out edge number: " << frag->GetOutEdgeNum();
for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
LOG(INFO) << "vertex table: " << vlabel << " -> "
<< frag->vertex_data_table(vlabel)->schema()->ToString();
}
for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
LOG(INFO) << "edge table: " << elabel << " -> "
<< frag->edge_data_table(elabel)->schema()->ToString();
}
LOG(INFO) << "--------------- consolidate vertex/edge table columns ...";
if (frag->vertex_data_table(0)->columns().size() >= 4) {
for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
LOG(INFO) << "vertex table: " << vlabel << " -> "
<< frag->vertex_data_table(vlabel)->schema()->ToString();
}
}
if (frag->edge_data_table(0)->columns().size() >= 4) {
for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
LOG(INFO) << "edge table: " << elabel << " -> "
<< frag->edge_data_table(elabel)->schema()->ToString();
}
}
}