Persistent storage of graphs on the Kubernetes cluster#
If you want to persistently store specific graphs that have been calculated over a long period of time on the Kubernetes cluster and restore them later, this document provides step-by-step instructions on how to do this with the Kubernetes PersistentVolumes.
Prerequisites#
You have a Kubernetes cluster on hand. If you don’t have a Kubernetes cluster, please refer to Prepare a Kubernetes cluster for details.
You have the
graphscope
Python library installed. If you don’t have installed it, please refer to Install GraphScope Client for details.
Create a pv and pvc#
$ kubectl create namespace graphscope-system
Then create the pv as follows, the pv will be mounted to /var/vineyard/dump
in the Kubernetes node. You can change the path to any other path you want.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
name: graphscope-pv
labels:
app.kubernetes.io/name: test-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /var/vineyard/dump
storageClassName: manual
EOF
Create pvc as follows. Most importantly, the pvc can’t be deleted, otherwise the data will be lost.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: graphscope-pvc
namespace: graphscope-system
spec:
selector:
matchLabels:
app.kubernetes.io/name: test-pv
resources:
requests:
storage: 1Gi
accessModes:
- ReadWriteOnce
storageClassName: manual
EOF
Store graphs to the pvc#
After the above preparations are completed, you can deploy the graphscope cluster as follows:
import graphscope
import os
import vineyard
from graphscope.dataset import load_modern_graph
# export the gs_test_dir to the environment variable
k8s_volumes = {
"data": {
"type": "hostPath",
"field": {"path": os.environ["GS_TEST_DIR"], "type": "Directory"},
"mounts": {"mountPath": "/testingdata"},
}
}
# create a graphscope session with the external
# vineyard deployment.
#
# Notice, the num_workers should not be greater than the
# number of nodes in the kubernetes cluster.
sess = graphscope.session(
num_workers=1,
k8s_image_registry="docker.io",
k8s_image_tag="ccc",
k8s_namespace="graphscope-system",
k8s_vineyard_deployment="vineyardd-sample",
k8s_volumes=k8s_volumes,
)
# load modern graph
graph = load_modern_graph(sess, "/testingdata/modern_graph")
# create the gie instance
interactive = sess.gremlin(graph)
# get the subgraph
sub_graph = interactive.subgraph(
'g.V().hasLabel("person").outE("knows")'
)
# project the projected graph to simple graph.
simple_g = sub_graph.project(vertices={"person": []}, edges={"knows": []})
pr_result = graphscope.pagerank(simple_g, delta=0.8)
tc_result = graphscope.triangles(simple_g)
# add the PageRank and triangle-counting results as new columns to the property graph
sub_graph.add_column(pr_result, {"Ranking": "r"})
sub_graph.add_column(tc_result, {"TC": "r"})
# print the simple graph and subgraph's vineyard_id
# REMEMBER the several vineyard ids, you need to use them to restore the graphs next time.
print(simple_g.vineyard_id)
# REMEMBER THIS: 997255889378630
print(sub_graph.vineyard_id)
# REMEMBER THIS: 997163552113975
# store the simple graph and subgraph to the pvc
# use the previous path of the pv and the pvc name here
sess.store_graphs_to_pvc(
graphIDs=[vineyard.ObjectID(simple_g.vineyard_id), vineyard.ObjectID(sub_graph.vineyard_id)],
path="/var/vineyard/dump",
pvc_name="graphscope-pvc",
)
# check the simple graph's schema
print(simple_g.schema)
# oid_type: LONG
# vid_type: ULONG
# type: VERTEX
# Label: person
# Properties:
#
# type: EDGE
# Label: knows
# Properties:
# Relations: [Relation(source='person', destination='person')]
# check the subgraph's schema
print(sub_graph.schema)
# oid_type: LONG
# vid_type: ULONG
# type: VERTEX
# Label: person
# Properties: Property(0, name, STRING), Property(1, age, INT), Property(2, id, LONG)
#
# type: VERTEX
# Label: software
# Properties: Property(0, name, STRING), Property(1, lang, STRING), Property(2, id, LONG)
#
# type: EDGE
# Label: created
# Properties: Property(0, eid, LONG), Property(1, weight, DOUBLE)
# Relations: [Relation(source='person', destination='software')]
# type: EDGE
# Label: knows
# Properties: Property(0, eid, LONG), Property(1, weight, DOUBLE)
# Relations: [Relation(source='person', destination='person')]
# close the session
sess.close()
Retore graphs from the pvc#
Remember the vineyard ids printed above and the pvc name and then you can restore the graphs from the pvc as follows.
import graphscope
import os
import vineyard
# create a graphscope session with the external
# vineyard deployment.
#
# Notice, the num_workers should not be greater than the
# number of nodes in the kubernetes cluster.
sess = graphscope.session(
num_workers=1,
k8s_image_registry="docker.io",
k8s_image_tag="ccc",
k8s_namespace="graphscope-system",
k8s_vineyard_deployment="vineyardd-sample",
)
# load graphs from the pvc
sess.restore_graphs_from_pvc(
path="/var/vineyard/dump",
pvc_name="graphscope-pvc"
)
# get the simple graph and subgraph
simple_g = sess.g(vineyard.ObjectID(997255889378630))
sub_graph = sess.g(vineyard.ObjectID(997163552113975))
# check the graphs' schema
print(simple_g.schema)
# oid_type: LONG
# vid_type: ULONG
# type: VERTEX
# Label: person
# Properties:
#
# type: EDGE
# Label: knows
# Properties:
# Relations: [Relation(source='person', destination='person')]
print(sub_graph.schema)
# oid_type: LONG
# vid_type: ULONG
# type: VERTEX
# Label: person
# Properties: Property(0, name, STRING), Property(1, age, INT), Property(2, id, LONG)
#
# type: VERTEX
# Label: software
# Properties: Property(0, name, STRING), Property(1, lang, STRING), Property(2, id, LONG)
#
# type: EDGE
# Label: created
# Properties: Property(0, eid, LONG), Property(1, weight, DOUBLE)
# Relations: [Relation(source='person', destination='software')]
# type: EDGE
# Label: knows
# Properties: Property(0, eid, LONG), Property(1, weight, DOUBLE)
# Relations: [Relation(source='person', destination='person')]
Clean up#
If you don’t need the graphs anymore, you can delete the pvc and pv as follows.
$ kubectl delete pvc graphscope-pvc -n graphscope-system
$ kubectl delete pv graphscope-pv