Deploy on K8s Cluster#
To processing large-scale graph in a distributed environment, GraphScope is designed to be deployed on a Kubernetes(K8s) cluster.
As shown in the figure, you could deploy and manage the workloads of GraphScope through a python client, which communicates with the GraphScope engines on the K8s cluster through a gRPC service.
A cluster on k8s contains a pod running the coordinator, and a deployment
of GraphScope engines.
The coordinator in GraphScope is the endpoint of the backend. It manages the connections from python client via grpc, and takes responsibility for applying or releasing the pods for interactive, analytical and learning engines.
This document describes how to deploy GraphScope on a K8s cluster.
Prerequisites#
Linux or macOS.
Python 3.7 ~ 3.11.
Install GraphScope Client#
Different from the standalone mode, you only need to install the client package of GraphScope.
python3 -m pip install graphscope-client
Tip
Use Aliyun mirror to accelerate downloading if in need.
python3 -m pip install graphscope-client -i http://mirrors.aliyun.com/pypi/simple/ \
--trusted-host=mirrors.aliyun.com
Prepare a Kubernetes cluster#
To deploy GraphScope on Kubernetes, you must have a kubernetes cluster.
Tip
If you already have a K8s cluster, just skip this section and continue on deploying.
We recommend using minikube. Please follow the instructions of minikube to download an appropriate binary for your platform.
Then, start the minikube by
minikube start
On macOS, you can just use Docker Desktop, which includes a standalone Kubernetes server and client.
Using this command to verify minikube is running.
minikube status
A normal status should look like this
The output should show that the cluster is running, and the kubectl context is set to the minikube context. Once started, minikube generates a kubeconfig file for you to communicate and interact with the cluster.
The default location of this file is ~/.kube/config
, which should look like this:
apiVersion: v1
clusters:
- cluster:
certificate-authority: /root/.minikube/ca.crt
extensions:
- extension:
last-update: Thu, 16 Mar 2023 16:44:05 CST
provider: minikube.sigs.k8s.io
version: v1.28.0
name: cluster_info
server: https://172.21.67.111:8443
name: minikube
contexts:
- context:
cluster: minikube
extensions:
- extension:
last-update: Thu, 16 Mar 2023 16:44:05 CST
provider: minikube.sigs.k8s.io
version: v1.28.0
name: context_info
namespace: default
user: minikube
name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
user:
client-certificate: /root/.minikube/profiles/minikube/client.crt
client-key: /root/.minikube/profiles/minikube/client.key
Deploying GraphScope#
Launch with default parameters#
The engines of GraphScope are distributed as a docker image. The graphscope python client will pull the image if they are not present. If you run GraphScope on a k8s cluster, make sure the cluster is able to access the public registry.
A session encapsulates the control and state of the GraphScope engines. It serves as the entrance in the python client to GraphScope. A session allows you to deploy and connect GraphScope on a k8s cluster.
import graphscope
sess = graphscope.session()
As default, it will look for a kubeconfig
file in ~/.kube/config
, the file generated by minikube in the previous step will be used.
As shown above, a session can easily launch a cluster on k8s.
Frequently used parameters#
Customize image URI#
Considering that you may want to use a different tag other than the default, or deploy in an intranet environment without internet access, they might need to customize the image URIs.
You can configure the image URIs for the engines using a set of image-related parameters. The default configurations are as follows:
sess = graphscope.session(
k8s_image_registry="registry.cn-hongkong.aliyuncs.com",
k8s_image_repository="graphscope",
k8s_image_tag="0.20.0",
)
see more details in Session.
Specify the number of workers#
GraphScope is designed to handle extremely large-scale graphs that cannot fit in the memory of a single worker. To process such graphs, you can increase the number of workers, as well as the CPU and memories of workers.
To achieve this, use the num_workers
parameter:
sess = graphscope.session(
num_workers=4,
k8s_engine_cpu=32,
k8s_engine_mem="256Gi",
vineyard_shared_mem="256Gi",
)
Provide a kubeconfig
file other than default#
If you want to deploy on a preexisting cluster with a kubeconfig
file located in a non-default location,
they can manually specify the path to the kubeconfig
file as follows:
sess = graphscope.session(k8s_client_config='/path/to/config')
Mount volumes#
Sometimes you may want to use their dataset on the local disk, in this case, we provide options to mount a host directory to the cluster.
Assume we want to mount ~/test_data
in the host machine to /testingdata
in pods, we can define a dict
as follows, then pass it as k8s_volumes
in session constructor.
Note that the host path is relative to the kubernetes node, that is, if you have a cluster created by a VM driver, then you need to copy that directory to the minikube VM, or mount that path to minikube VM. See more details here.
import os
import graphscope
k8s_volumes = {
"data": {
"type": "hostPath",
"field": {
"path": os.path.expanduser("~/test_data/"),
"type": "Directory"
},
"mounts": {
"mountPath": "/testingdata"
}
}
}
sess = graphscope.session(k8s_volumes=k8s_volumes)
Inspect the deployment#
The launch time of GraphScope depends on the time it takes to pull the necessary Docker images. The pulling time is influenced by the network conditions. Once the images are pulled, you can expect GraphScope to be up and running in less than 10 seconds.
Monitor the status of the deployment with the following command:
kubectl get pods
The output should show the status of the GraphScope pods. Here’s an example
Wait until all pods are running before proceeding.
You can further inspect the status of pods using kubectl describe pod <pod-name>
.
That’s it! You now have a running instance of GraphScope in a Kubernetes cluster.
You can use GraphScope to analyze graphs as usual. Check out the Getting Started guide for more information.
Cleaning Up#
When you have finished using GraphScope, you can remove the deployment by executing the following command.
sess.close()
You can check if there are any remaining resources by:
kubectl get deployments
kubectl get statefulsets
kubectl get svc
If there are still resources left, you may need to delete them manually by:
kubectl delete deployment <deployment-name>
kubectl delete statefulsets <statefulsets-name>
kubectl delete svc <svc-name>
To stop and delete the minikube cluster, run:
minikube stop
minikube delete