In-memory immutable graphs on Vineyard#

Vineyard is a distributed immutable in-memory data manager that is used as the storage backend for immutable graphs in GraphScope. Vineyard provides zero-copy data sharing using memory mapping, and different compute engines in GraphScope can run on the same vineyard cluster to efficiently share the graph data.

Graphs in Vineyard#

Vineyard supports immutable property graphs and abstracts it as the vineyard::ArrowFragment class, which consists of a CSR for edges and uses tables to store edge and vertex properties. Upon the ArrowFragment, vineyard abstracts distributed graph as vineyard::ArrowFragmentGroup which consists of a set of fragments that spread across the cluster.

Loading Graphs to Vineyard#

Vineyard can be deployed as a standalone service or launched along with GraphScope. A command-line tool vineyard-graph-loader is provided to load fragments into vineyard. It first accepts an optional argument --socket <vineyard-ipc-socket>, which points the IPC docket that the loader will connect to. If omitted, the value will be resolved from the environment variable VINEYARD_IPC_SOCKET. It takes either a set of command-line arguments or a JSON file as configuration.

$ vineyard-graph-loader --help
Usage: loading vertices and edges as vineyard graph.

    -     ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] \
                                   <e_label_num> <efiles...> <v_label_num> <vfiles...> \
                                   [directed] [generate_eid] [retain_oid] [string_oid]

    - or: ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] --config <config.json>

          The config is a json file and should look like

          {
              "vertices": [
                  {
                      "data_path": "....",
                      "label": "...",
                      "options": "...."
                  },
                  ...
              ],
              "edges": [
                  {
                      "data_path": "",
                      "label": "",
                      "src_label": "",
                      "dst_label": "",
                      "options": ""
                  },
                  ...
              ],
              "directed": 1, # 0 or 1
              "generate_eid": 1, # 0 or 1
              "retain_oid": 1, # 0 or 1
              "string_oid": 0, # 0 or 1
              "local_vertex_map": 0 # 0 or 1
          }%

Some of the options that specify how the graph will be constructed are:

  • directed: whether the graph is a directed graph or undirected graph.

  • generate_eid: whether to generate a globally unique edge id for each edge.

  • retain_oid: whether to retain the original vertex id into the final vertex’s property table.

  • string_oid: whether the vertex id is a string.

  • local_vertex_map: whether to use local vertex map during the graph construction, which is usually used for optimizing the memory usage.

Using the vineyard-graph-loader to load the modern graph can be done in the following ways:

  • using command line arguments

    The vineyard-graph-loader accepts a sequence of command line arguments to specify the edge files and vertex files, e.g.,

    $ ./vineyard-graph-loader 2 "modern_graph/knows.csv#header_row=true&src_label=person&dst_label=person&label=knows&delimiter=|" \
                                "modern_graph/created.csv#header_row=true&src_label=person&dst_label=software&label=created&delimiter=|" \
                              2 "modern_graph/person.csv#header_row=true&label=person&delimiter=|" \
                                "modern_graph/software.csv#header_row=true&label=software&delimiter=|"
    
  • using a JSON configuration file

    $ ./vineyard-graph-loader --config config.json
    

    The JSON configuration file could be (using the “modern graph” as an example):

       {
           "vertices": [
               {
                   "data_path": "modern_graph/person.csv",
                   "label": "person",
                   "options": "header_row=true&delimiter=|"
               },
               {
                   "data_path": "modern_graph/software.csv",
                   "label": "software",
                   "options": "header_row=true&delimiter=|"
               }
           ],
           "edges": [
               {
                   "data_path": "modern_graph/knows.csv",
                   "label": "knows",
                   "src_label": "person",
                   "dst_label": "person",
                   "options": "header_row=true&delimiter=|"
               },
               {
                   "data_path": "modern_graph/created.csv",
                   "label": "created",
                   "src_label": "person",
                   "dst_label": "software",
                   "options": "header_row=true&delimiter=|"
               }
           ],
           "directed": 1,
           "generate_eid": 1,
           "string_oid": 0,
           "local_vertex_map": 0
       }
    

Using Loaded Graphs#

After being loaded into vineyard, the loaded fragment can be accessed using vineyard’s IPCClient:

void WriteOut(vineyard::Client& client, const grape::CommSpec& comm_spec,
              vineyard::ObjectID fragment_group_id) {
  LOG(INFO) << "Loaded graph to vineyard: " << fragment_group_id;
  std::shared_ptr<vineyard::ArrowFragmentGroup> fg =
      std::dynamic_pointer_cast<vineyard::ArrowFragmentGroup>(
          client.GetObject(fragment_group_id));

  for (const auto& pair : fg->Fragments()) {
    LOG(INFO) << "[frag-" << pair.first << "]: " << pair.second;
  }

  // NB: only retrieve local fragments.
  auto locations = fg->FragmentLocations();
  for (const auto& pair : fg->Fragments()) {
    if (locations.at(pair.first) != client.instance_id()) {
      continue;
    }
    auto frag_id = pair.second;
    Traverse(client, frag_id);
  }
}

The local fragment can be traversed using the vineyard::ArrowFragment’s API:

void Traverse(vineyard::Client& client, vineyard::ObjectID frag_id) {
  auto frag = std::dynamic_pointer_cast<GraphType>(client.GetObject(frag_id));
  LOG(INFO) << "graph total node number: " << frag->GetTotalNodesNum();
  LOG(INFO) << "fragment edge number: " << frag->GetEdgeNum();
  LOG(INFO) << "fragment in edge number: " << frag->GetInEdgeNum();
  LOG(INFO) << "fragment out edge number: " << frag->GetOutEdgeNum();

  for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
    LOG(INFO) << "vertex table: " << vlabel << " -> "
              << frag->vertex_data_table(vlabel)->schema()->ToString();
  }
  for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
    LOG(INFO) << "edge table: " << elabel << " -> "
              << frag->edge_data_table(elabel)->schema()->ToString();
  }

  LOG(INFO) << "--------------- consolidate vertex/edge table columns ...";

  if (frag->vertex_data_table(0)->columns().size() >= 4) {
    for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
      LOG(INFO) << "vertex table: " << vlabel << " -> "
                << frag->vertex_data_table(vlabel)->schema()->ToString();
    }
  }

  if (frag->edge_data_table(0)->columns().size() >= 4) {
    for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
      LOG(INFO) << "edge table: " << elabel << " -> "
                << frag->edge_data_table(elabel)->schema()->ToString();
    }
  }
}