Key Features#

Ease-of-use: Python Interface#

The Python interface of GraphScope offers an intuitive and user-friendly way for data scientists to develop, test and deploy complex graph computation workflows quickly and correctly. The session abstraction encapsulates the environment where data manipulation and graph computation operations are executed or evaluated. The Python clients interact with the GraphScope graph computation service cluster via sessions.

GraphScope also provides NetworkX-compatible APIs to support the creation, manipulation, query and analysis of graphs, offering an easy transition for users with prior experiences in the NetworkX library. The built-in graph analysis algorithms in GraphScope have compatible interfaces with the corresponding NetworkX counterparts, and can be well parallelized with a performance boost of several orders of magnitudes.

Graph Traversal Support, in Gremlin and Cypher#

GraphScope Interactive Engine(GIE) leverages Gremlin, a graph traversal language developed by Apache TinkerPop, to offer a powerful and intuitive way of querying graphs interactively, and supports automatic query parallelization. GIE implements TinkerPop’s Gremlin Server interface such that the system can seamlessly interact with the TinkerPop ecosystem, including development tools such as Gremlin Console and language wrappers such as Gremlin-Python and Gremlin-Java.

Cypher is a declarative programming language for graph queries. It is designed to make it easy for users to specify the graph patterns they are interested in. We are currently working on adding support for the Cypher language to provide users with more flexibility and options on graph queries.

High Performant Built-in Algorithms#

GraphScope is equipped with various built-in graph algorithms, consisting of 20 graph analytics algorithms and 8 GNN models. The built-in algorithms are highly optimized, and offer superior performance. For example, we have performed a comparison with state-of-the-art graph processing systems on LDBC Graph Analytics Benchmark, and the results show GraphScope outperforms other graph systems (see more detailed results here).

Extensible Algorithm Library for Graph Analytics#

GraphScope Analytical Engine(GAE) provides a rich set of commonly used algorithms, including connectivity and path analysis, community detection and centrality computations. This directory includes a full list of the built-in algorithms, which is continuously growing.

GAE also provides developers the flexibility to customize their own algorithms with different programming models and programming languages. Currently, the programing models that GAE supports include the sub-graph based PIE model and the vertex-centric Pregel model. Meanwhile, GAE provides a multi-language SDK, and users can choose to write their own algorithms in C++, Java or Python.

GNN Training & Inference#

GraphScope Learning Engine (GLE) offers users an easy-to-use approach to take advantage of high-performance GNN training and inference on large-scale graphs. GLE provides multiple commonly-used (negative)sampling operators to facilitate graph sampling, and allows users to define graph queries (e.g., N-hop sampling queries) in a Gremlin-style Graph Query Language (GSL). It also comes with a wide range of GNN models, like GCN, GAT, GraphSAGE, and SEAL, and includes commonly-utilized algorithm modules (e.g., DeepWalk and TransE). Furthermore, GLE provides a set of paradigms and processes to ease the development of customized models. GLE is compatible with PyG, e.g., this example shows that a PyG model can be trained using GLE with very minor modifications. Users can flexibly choose TensorFlow or PyTorch as the training backend.

To support online inference on dynamic graphs, we propose Dynamic Graph Service (DGS) in GLE to facilitate real-time sampling on dynamic graphs. The sampled subgraph can be fed into the serving modules (e.g., TensorFlow Serving) to obtain the inference results. This document is organized to provide a detailed, step-by-step tutorial specifically demonstrating the use of GLE for offline training and online inference.

Cloud Native Design#

Cloud native is the software approach of building, deploying, and managing modern applications in cloud computing environments. GraphScope provides a set of Dockerfiles and users could easily build docker images of GraphScope. In addition, users could deploy GraphScope on a Kubernetes (k8s) cluster, or on managed Kubernetes service of cloud service providers (e.g., EKS for AWS and ACK for Alibaba Cloud). GraphScope also provides essential tools to allow users to handling situations like monitoring, failover, scaling and rolling update.

Across-Engine Workflow Orchestration#

GraphScope allows users to orchestrate complex graph computation workflows that combine tasks of various workloads, such as graph query, graph analytics and graph learning. With Vineyard that offers efficient in-memory data management, graph computation tasks backed up by different engines (GAE, GIE and GLE) can be seamlessly orchestrated into customized workflows. Users only need to concentrate on the workflow logics, complex operations such as storing and transferring intermediate results, computation parallelization and workflow pipelining in a distributed cluster are automatically handled by GraphScope. This tutorial demonstrates an example of across-engine workflow orchestration in GraphScope.