Tutorial: Run Giraph Applications on GraphScope#
Apache Giraph is one of the most famous graph computing frameworks, built on top of Apache Hadoop. Through pregel
interface, user can write vertex-centric
graph algorithms.
GraphScope aiming to provide one-stop graph processing framework, including integrating with popular open-source graph computing framework. Actually, Giraph algorithms can be easily run on GraphScope without any adaptation.
Try some example giraph apps#
We provide some example giraph algorithms, i.e. SSSP, PageRank in grape-demo.jar. You can try to run these Giraph algorithms on GraphScope.
As Giraph
allows user to load graph with customized loader, we support Giraph VertexInputFormat
and Giraph EdgeInputFormat
with session.load_from
method.
vformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PVertexInputFormat"
eformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PEdgeInputFormat"
#clone https://github.com/GraphScope/gstest to GS_TEST_DIR
graph = graphscope_session.load_from(
vertices="/path/to/vertex-input",
vformat=vformat,
edges="/path/to/edge-input",
eformat=eformat,
)
vertices and edges should points to vertex input and edge input. We also provide some example dataset gstest
at GraphScope/gstest.
In this tutorial we will only need p2p
dataset. You can download it by:
wget https://raw.githubusercontent.com/GraphScope/gstest/master/p2p-31.e /home/graphscope/p2p-31.e
wget https://raw.githubusercontent.com/GraphScope/gstest/master/p2p-31.v /home/graphscope/p2p-31.v
Then you can load graph via graphscope python client, and query the graph with giraph app.
import graphscope
import os
from graphscope.framework.app import load_app
"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts')
sess.add_lib("/home/graphscope/grape-demo-0.19.0-shaded.jar")
# Remember to put giraph: before class name.
vformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PVertexInputFormat"
eformat = "giraph:com.alibaba.graphscope.example.giraph.format.P2PEdgeInputFormat"
# Replace path p2p.v and p2p.3 with your own path.
graph = sess.load_from(
vertices=os.path.expandvars("/home/graphscope/p2p-31.v"),
vformat=vformat,
edges=os.path.expandvars("/home/graphscope/p2p-31.e"),
eformat=eformat,
)
graph = graph._project_to_simple(v_prop="vdata", e_prop="data")
giraph_sssp = load_app(algo="giraph:com.alibaba.graphscope.example.giraph.SSSP")
ctx = giraph_sssp(graph, sourceId=6)
ctx.to_numpy('r')
Run your own Giraph apps.#
After a successful running of example giraph SSSP algorithm, you may want to try your own giraph algorithm on GraphScope(which runs much faster then Giraph itself).
Develop Giraph algorithm#
You can implement your algorithm towards Giraph’ original API. For example, you can use Giraph official example apps.
git clone https://github.com/apache/giraph.git
cd giraph/
mvn package -pl :giraph-examples
Then you could find giraph-examples-1.4.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
in directory giraph-examples/target
.
Although almost all APIs are supported, there are indeed some limitation of Giraph-on-GraphScope.
Currently graph modification API is not supported.
Using of Complex Writable will cause performance degradation.
Submit to GraphScope.#
The procedure almost the same as above, except that you need to replace the submitted jar, and choose right InputFormat
classes.
import graphscope
"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts')
# path to local jar file, will be distributed over cluster
graphscope_session.add_lib("path/to/grape-demo.jar")
vformat = "giraph:${vertex-input-format-class-full-name}"
eformat = "giraph:${edge-input-format-class-full-name}"
#clone https://github.com/GraphScope/gstest to GS_TEST_DIR
graph = graphscope_session.load_from(
vertices=os.path.expandvars("${path-to-vertex-file}"), # path to local vertex file, will be distributed over cluster
vformat=vformat,
edges=os.path.expandvars("${path-to-edge-file}"), # path to local edge file, will be distributed over cluster
eformat=eformat,
)
graph = graph._project_to_simple(v_prop="vdata", e_prop="data")
giraph_sssp = load_app(algo="giraph:${giraph-computation-class-full-name}")
ctx = giraph_sssp(g, "${a=1,b=2...}")