GraphScope Learning Engine¶
The learning engine in GraphScope (GLE) drives from Graph-Learn, a distributed framework designed for development and training of large-scale graph neural networks (GNNs). GLE provides a programming interface carefully designed for the development of graph neural network models, and has been widely applied in many scenarios within Alibaba, such as search recommendation, network security, and knowledge graphs.
Next, we will walk through a quick-start turtorial on how to build a user-defined GNN model using GLE.
Graph Learning Model¶
There are two ways to train a graph learning model. One is to compute based on the whole graph directly. The GCN and GAT are originally proposed using this approach, directly computing on the entire adjacency matrix. However, this approach will consume huge amount of memory on large-scale graphs, limiting its applicability. The other approach is to sample subgraphs from the whole graph, and use batch training that is commonly used in deep learning. The typical examples include GraphSAGE,FastGCN and GraphSAINT methods.
GLE is designed for large-scale graph neural networks. It consists of an efficient graph engine, a set of user-friendly APIs, and a rich set of built-in popular GNN models. The graph engine stores the graph topology and attributes distributedly, and support efficient graph sampling amd query. It can work with popular tensor engines including TensorFlow and PyTorch. In the following, our model implemnetations are based on TensorFlow.

Data model¶
To build and train a model, GLE usually samples subgraphs as the training data, and perform batch training with it. We start with introducing the basic data model.
EgoGraph
is the underlying data model in GLE. It consists of a
batch of seed nodes or edges(named ‘ego’) and their receptive fields
(multi-hops neighbors). We implement many build-in samplers to traverse
the graph and sample the neighbors. Negative samplers are also implemented
for unsupervised training.
The sampled data grouped in EgoGraph
is organized into numpy format.
It can be converted to different tensor formats, EgoTensor
, based on
the different deep learning engine. GLE uses EgoFlow
to convert
EgoGraph
to EgoTensor
.
And the EgoTensor
serves as the training data.

Encoder¶
A graph learning model can be viewed as using an encoder to
encode the EgoTensor
of a node, edge or subgraph into a vector.
GLE first uses feature encoders to encode raw features of nodes or edges, and the produced feature embeddings are then encoded by different graph encoders to produce the final embedding vectors. For most of GNN models, graph encoders provide a way to generate an abstraction of a target node or edge by aggregating information from its neighbors. This aggregation and encoding are usually implemented by many different graph convolutional layers.

Based on the data models and encoders, one can easily implement different graph learning models. We introduce in detail how to develope a GNN model in the next section.
Developing Your Own Model¶
In this document, we will introduce how to use the basic APIs provided by GLE to cooperate with deep learning engines, such as TensorFlow, to build graph learning algorithms. We demonstrate the GCN model as an example which is one of the most popular models in graph neural network.
In general, it requires the following four steps to build an algorithm.
Specify sampling mode: use graph sampling and query methods to sample subgraphs and organize them into
EgoGraph
We abstract out four basic functions,
sample_seed
,positive_sample
,negative_sample
andreceptive_fn
. To generateNode
orEdges
, we usesample_seed
to traverse the graph. Then, we usepositive_sample
withNodes
orEdges
as inputs to generate positive samples for training. For unsupervised learningnegative_sample
produces negative samples. GNNs need to aggregate neighbor information so that we abstractreceptive_fn
to sample neighbors. Finally, theNodes
andEdges
produced bysample_seed
, and their sampled neighbors form anEgoGraph
.Construct graph data flow: convert
EgoGraph
toEgoTensor
usingEgoFlow
GLE algorithm model is based on a deep learning engine similar to TensorFlow. As a result, it requires to convert the sampled
EgoGraph
s to the tensor formatEgoTensor
, which is encapsulated inEgoFlow
that can generate an iterator for batch training.Define encoder: Use
EgoGraph
encoder and feature encoder to encodeEgoTensor
After getting the
EgoTensor
, we first encode the original nodes and edge features into vectors using common feature encoders. Then, we feed the vectors into a GNN model as the feature input. Next, we use the graph encoder to process theEgoTensor
, combining the neighbor node features with its characteristics to get the nodes or edge vectors.Design loss functions and training processes: select the appropriate loss function and write the training process.
GLE has built-in common loss functions and optimizers. It also encapsulates the training process. GLE supports both single-machine and distributed training. Users can also customize the loss functions, optimizers and training processes.
Next, we introduce how to implement a GCN model using the above four steps.
Sampling¶
We use the Cora dataset as the node classification example. We provide a
simple data conversion script cora.py
to convert the original Cora
to the format required by GLE. The script generates following 5
files: node_table, edge_table_with_self_loop, train_table, val_table and
test_table. They are the node table, the edge table, and the nodes
tables used to distinguish training, validation, and testing sets.
Then, we can construct the graph using the following code snippet.
import graphlearn as gle
g = gle.Graph()\
.node(dataset_folder + "node_table", node_type=node_type,
decoder=gle.Decoder(labeled=True,
attr_types=["float"] * 1433,
attr_delimiter=":"))\
.edge(dataset_folder + "edge_table_with_self_loop",
edge_type=(node_type, node_type, edge_type),
decoder=gle.Decoder(weighted=True), directed=False)\
.node(dataset_folder + "train_table", node_type="train",
decoder=gle.Decoder(weighted=True))\
.node(dataset_folder + "val_table", node_type="val",
decoder=gle.Decoder(weighted=True))\
.node(dataset_folder + "test_table", node_type="test",
decoder=gle.Decoder(weighted=True))
We load the graph into memory by calling g.init()
.
class GCN(gle.LearningBasedModel):
def __init__(self,
graph,
output_dim,
features_num,
batch_size,
categorical_attrs_desc='',
hidden_dim=16,
hops_num=2,):
self.graph = graph
self.batch_size = batch_size
The GCN model inherits from the basic learning model class
LearningBasedModel
. As a result, we only need to override the
sampling, model construction, and other methods to build GCN model.
class GCN(gle.LearningBasedModel):
# ...
def _sample_seed(self):
return self.graph.V('train').batch(self.batch_size).values()
def _positive_sample(self, t):
return gle.Edges(t.ids, self.node_type,
t.ids, self.node_type,
self.edge_type, graph=self.graph)
def _receptive_fn(self, nodes):
return self.graph.V(nodes.type, feed=nodes).alias('v') \
.outV(self.edge_type).sample().by('full').alias('v1') \
.outV(self.edge_type).sample().by('full').alias('v2') \
.emit(lambda x: gle.EgoGraph(x['v'], [ag.Layer(nodes=x['v1']), ag.Layer(nodes=x['v2'])]))
_sample_seed
and _positive_sample
use to sample seed nodes and
positive samples. _receptive_fn
samples neighbors and organizes
EgoGraph
. OutV
returns one-hop neighbors so the above code
samples two-hop neighbors. Users can choose different neighbor sampling
methods. For the original GCN, it requires all neighbors of each node
are so we use ‘full’ for sampling. We aggregate the sampling results in
EgoGraph
which is the return value.
Graph Data Flow¶
In build
function, we convert EgoGraph
to EgoTensor
using
EgoFlow
. EgoFlow
contains an data flow iterator and several
EgoTensor
s.
class GCN(gle.LearningBasedModel):
def build(self):
ego_flow = gle.EgoFlow(self._sample_seed,
self._positive_sample,
self._receptive_fn,
self.src_ego_spec)
iterator = ego_flow.iterator
pos_src_ego_tensor = ego_flow.pos_src_ego_tensor
# ...
We can get the EgoTensor
corresponding to the previous EgoGraph
from EgoFlow
.
Model¶
Next, we first use the feature encoder to encode the original features.
In this example, we use IdentityEncoder
that returns itself, because
the features of Cora are already in vector formats. For both the
discrete and continuous features, we can use WideNDeepEncoder
, To
learn more encoders, please refer to
feature encoder.
Then, we use the
GCNConv
layer to construct the graph encoder. For each node in GCN,
we sample all of its neighbors, and organize them in a sparse format.
Therefore, we use SparseEgoGraphEncoder
. For the neighbor-aligned
model, please refer to the implementation of GraphSAGE.
class GCN(gle.LearningBasedModel):
def _encoders(self):
depth = self.hops_num
feature_encoders = [gle.encoders.IdentityEncoder()] * (depth + 1)
conv_layers = []
# for input layer
conv_layers.append(gle.layers.GCNConv(self.hidden_dim))
# for hidden layer
for i in range(1, depth - 1):
conv_layers.append(gle.layers.GCNConv(self.hidden_dim))
# for output layer
conv_layers.append(gle.layers.GCNConv(self.output_dim, act=None))
encoder = gle.encoders.SparseEgoGraphEncoder(feature_encoders,
conv_layers)
return {"src": encoder, "edge": None, "dst": None}
Loss Function and Training Process¶
For the Cora node classification model, we can select the corresponding
classification loss function in TensorFlow. Then, we combine the encoder
and loss function in the build
function, and finally return a data
iterator and a loss function.
class GCN(gle.LearningBasedModel):
# ...
def _supervised_loss(self, emb, label):
return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(emb, label))
def build(self):
ego_flow = gle.EgoFlow(self._sample_seed,
self._positive_sample,
self._receptive_fn,
self.src_ego_spec,
full_graph_mode=self.full_graph_mode)
iterator = ego_flow.iterator
pos_src_ego_tensor = ego_flow.pos_src_ego_tensor
src_emb = self.encoders['src'].encode(pos_src_ego_tensor)
labels = pos_src_ego_tensor.src.labels
loss = self._supervised_loss(src_emb, labels)
return loss, iterator
Next, we use LocalTFTrainer
to train on a single-machine.
def train(config, graph)
def model_fn():
return GCN(graph,
config['class_num'],
config['features_num'],
config['batch_szie'],
...)
trainer = gle.LocalTFTrainer(model_fn, epoch=200)
trainer.train()
def main():
config = {...}
g = load_graph(config)
g.init(server_id=0, server_count=1, tracker='../../data/')
train(config, g)
This concludes how to build a GCN model. Please refer to GCN example for the complete codes.
We have implemented a rich set of popular models, including GCN, GAT, GraphSage, DeepWalk, LINE, TransE, Bipartite GraphSage, sample-based GCN, GAT, etc., which can be used as a starting point for building a similar model.