Getting Started#

This tutorial provides a quick overview of GraphScope’s features. To begin, we will install GraphScope on your local machine using Python. Although most examples in this guide are based on local Python installation, it also works on a Kubernetes cluster.

You can easily install GraphScope through pip:

python3 -m pip install graphscope -U

Note

We recommend you to install GraphScope in a clean Python virtual environment with Python 3.9 with miniconda or venv.

Take venv for an example, there’s a step by step instruction to create a virtual environment, activate the environment and install GraphScope:

# Create a new virtual environment
python3.9 -m venv tutorial-env

# Activate the virtual environment
source tutorial-env/bin/activate

# Install GraphScope
python3.9 -m pip install graphscope

# Use Graphscope
python3.9
>>> import graphscope as gs
>>> ......

One-stop Graph Processing#

We will use a walking-through example to demonstrate how to use GraphScope to process various graph computation tasks in a one-stop manner.

The example targets node classification on a citation network.

ogbn-mag is a heterogeneous network composed of a subset of the Microsoft Academic Graph. It contains 4 types of entities (i.e., papers, authors, institutions, and fields of study), as well as four types of directed relations connecting two entities.

Given the heterogeneous ogbn-mag data, the task is to predict the class of each paper. Node classification can identify papers in multiple venues, which represent different groups of scientific work on different topics. We apply both the attribute and structural information to classify papers. In the graph, each paper node contains a 128-dimensional word2vec vector representing its content, which is obtained by averaging the embeddings of words in its title and abstract. The embeddings of individual words are pre-trained. The structural information is computed on-the-fly.

GraphScope models graph data as property graph, in which the edges/vertices are labeled and have many properties. Taking ogbn-mag as an example, the figure below shows the model of the property graph.

Sample of property graph.

Sample of property graph#

This graph has four kinds of vertices, labeled as paper, author, institution, and field_of_study. There are four kinds of edges connecting them, each kind of edge has a label and specifies the vertex labels for its two ends. For example, cites edges connect two vertices labeled paper. Another example is writes, it requires the source vertex is labeled author and the destination is a paper vertex. All the vertices and edges may have properties. e.g., paper vertices have properties like features, publish year, subject label, etc.

Interactive queries enable users to explore, examine, and present graph data in a flexible and in-depth manner, allowing them to find specific information quickly. GraphScope utilizes Gremlin, a high-level graph traversal language, for interactive queries and offers efficient execution at scale.

Graph analytics is widely used in the real world. Many algorithms, like community detection, paths and connectivity, and centrality, have proven to be very useful in various businesses. GraphScope comes with a set of built-in algorithms, enabling users to easily analyze their graph data.

Graph neural networks (GNNs) combines superiority of both graph analytics and machine learning. GNN algorithms can compress both structural and attribute information in a graph into low-dimensional embedding vectors on each node. These embeddings can be further fed into downstream machine learning tasks.

Then we define the training process, and run it.

Graph Analytical Task Quick Start#

The installed graphscope package includes everything you need to analyze a graph on your local machine. If you have a graph analytical job that needs to run iterative algorithms, it works well with graphscope.

Graph Interactive Query Quick Start#

With the graphscope package already installed, you can effortlessly engage with a graph on your local machine. You simply need to create the gremlin instance to serve as the conduit for submitting all Gremlin queries.

Graph Learning Quick Start#

TODO(LiSu):