Tutorial: Run Giraph Applications on GraphScope

Apache Giraph is one of the most famous graph computing frameworks, built on top of Apache Hadoop. Through pregel interface, user can write vertex-centric graph algorithms.

GraphScope aiming to provide one-stop graph processing framework, including integrating with popular open-source graph computing framework. Actually, Giraph algorithms can be easily run on GraphScope without any adaptation.

Try some example giraph apps

We provide some example giraph algorithms, i.e. SSSP, PageRank in grape-demo.jar. You can try to run these Giraph algorithms on GraphScope.

As Giraph allows user to load graph with customized loader, we support Giraph VertexInputFormat and Giraph EdgeInputFormat with session.load_from method.

vformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PVertexInputFormat"
eformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PEdgeInputFormat"

#clone https://github.com/GraphScope/gstest to GS_TEST_DIR
graph = graphscope_session.load_from(
    vertices="/path/to/vertex-input",
    vformat=vformat,
    edges="/path/to/edge-input",
    eformat=eformat,
)

vertices and edges should points to vertex input and edge input. We also provide some example dataset gstest at GraphScope/gstest. In this tutorial we will only need p2p dataset. You can download it by:

wget https://raw.githubusercontent.com/GraphScope/gstest/master/p2p-31.e /home/graphscope/p2p-31.e
wget https://raw.githubusercontent.com/GraphScope/gstest/master/p2p-31.v /home/graphscope/p2p-31.v

Then you can load graph via graphscope python client, and query the graph with giraph app.

import graphscope
import os
from graphscope.framework.app import load_app

"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts') 

sess.add_lib("/home/graphscope/grape-demo-0.19.0-shaded.jar")

# Remember to put giraph: before class name.
vformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PVertexInputFormat"
eformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PEdgeInputFormat"

# Replace path p2p.v and p2p.3 with your own path.
graph = sess.load_from(
    vertices=os.path.expandvars("/home/graphscope/p2p-31.v"),
    vformat=vformat,
    edges=os.path.expandvars("/home/graphscope/p2p-31.e"),
    eformat=eformat,
)
graph = graph._project_to_simple(v_prop="vdata", e_prop="data")

giraph_sssp = load_app(algo="giraph:com.alibaba.graphscope.example.giraph.SSSP")
ctx = giraph_sssp(graph, sourceId=6)

ctx.to_numpy('r')

Run your own Giraph apps.

After a successful running of example giraph SSSP algorithm, you may want to try your own giraph algorithm on GraphScope(which runs much faster then Giraph itself).

Develop Giraph algorithm

You can implement your algorithm towards Giraph’ original API. For example, you can use Giraph official example apps.

git clone https://github.com/apache/giraph.git
cd giraph/
mvn package -pl :giraph-examples

Then you could find giraph-examples-1.4.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar in directory giraph-examples/target.

Although almost all APIs are supported, there are indeed some limitation of Giraph-on-GraphScope.

  • Currently graph modification API is not supported.

  • Using of Complex Writable will cause performance degradation.

Submit to GraphScope.

The procedure almost the same as above, except that you need to replace the submitted jar, and choose right InputFormat classes.

import graphscope

"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts') 

# path to local jar file, will be distributed over cluster
graphscope_session.add_lib("path/to/grape-demo.jar")

vformat = "giraph:${vertex-input-format-class-full-name}"
eformat = "giraph:${edge-input-format-class-full-name}"

#clone https://github.com/GraphScope/gstest to GS_TEST_DIR
graph = graphscope_session.load_from(
    vertices=os.path.expandvars("${path-to-vertex-file}"), # path to local vertex file, will be distributed over cluster
    vformat=vformat,
    edges=os.path.expandvars("${path-to-edge-file}"), # path to local edge file,  will be distributed over cluster
    eformat=eformat,
)
graph = graph._project_to_simple(v_prop="vdata", e_prop="data")

giraph_sssp = load_app(algo="giraph:${giraph-computation-class-full-name}")
ctx = giraph_sssp(g, "${a=1,b=2...}")