Dev and Test#
This guide will walk you through the process of understanding how the code is organized, identifying key functions and important code, and building and testing the analytical engine.
For simplicity, we suggest you use a prebuilt docker image with necessary dependencies installed.
docker run --name dev -it --shm-size=4096m registry.cn-hongkong.aliyuncs.com/graphscope/graphscope-dev:latest
Alternatively, you can also manually install all dependencies on your local machine. Please refer to Dev Environment to find more options to get a dev environment.
After the environment is prepared, clone the repository and enter the
analatical_engine directory of the repository.
git clone https://github.com/alibaba/GraphScope.git cd analatical_engine
Understanding the Codebase#
Since the analytical engine inherits from GRAPE, it requires libgrape-lite as a dependency. Please note that the core functionalities of libgrape-lite, such as graph structures, graph partitioners, workers, communication between workers, and applications, are heavily reused in the analytical engine of GraphScope.
If you want to fully understand the analytcial engine, it is highly recommaned that you start from libgrape-lite.
The code located in the
analytical_engine directory functions like extensions to libgrape-lite, thereby making it full-fledged with the following enhancements:
K8s support to enable management by the GraphScope coordinator;
Many built-in algorithms, while libgrape-lite’s only ships with 6 analytical algorithms in LDBC Graphalytics Benchmark;
Property graphs and their flatten/projected graphs support;
Java support, thus making it possible to execute applications written for Giraph/GraphX on GraphScope.
The code is organized as follows:
appscontains various built-in algorithms/applications.
core: The core directory contains extensions to the libgrape-lite library and is organized in the same way as the libgrape-lite directory. The extensions are located in the same directory as their base in libgrape-lite. More specifically,
core/appcontains classes related to applications, which serve as base classes to inherit from when implementing new applications.
core/communicationcontains extension on communication layers.
core/cudacontains a suite of graph structure implementations and communications on the GPU for GPU-accelerated computations.
core/fragmentcontains the extended fragments and their loaders. e.g., mutable graph fragment.
core/parallelcontains the parallel sub-layer for computation and communications, such as a helper class for parallel execution with threads and how message buffers are managed and synced between workers.
core/serializationextend the serialization in libgrape-lite to include property graphs.
core/utilscontains utility functions and classes.
core/vertex_mapcontains some vertex_maps designed to manage the mapping between the original vertex ID and the internal identifier of a vertex.
core/workercontains the worker, which executes the applications locally and communicates with other workers.
frameare used to wrap the libgrape-lite library for integration into GraphScope. This is necessary because libgrape-lite heavily relies on Templates to define applications and graphs, which is inadequate for loading property graphs and applications in GraphScope scenarios. Property graphs usually have multiple label and property types, which cannot be determined before loading. For this reason, GraphScope has implemented JIT technology to compile property graphs and their associated applications at runtime. These frames serve as wrappers to facilitate these tasks.
javacontains Java implementations. GraphScope supports implementing applications with Pregel/Giraph and GraphX APIs. In addition, existing Giraph/GraphX applications(jars) can be run on GraphScope without any modification. Read more about Java support here.
benchmarkscontains code related to performance testing and benchmarking.
cmakecontains CMake scripts for configuring the build.
testcontains test cases and scripts.
The figure above illustrates the key components of libgrape-lite, (as well as analytical engine in GraphScope), and how they work. More specifically,
Fragmentis a partition of graph data and is a processing object for graphs on a computing node.
MessageMessagermanages communication strategies, takes responsible for managing message communication and state synchronization between fragments explicitly or implicitly.
Applicationis the main logic of the user’s application. In an application, the user can access the local
Fragmentor send/receive messages through the
Worker, a class that is responsible for loading the graph (
Fragment), calling the application to compute on the local
Fragment, and communicating with
Workers on other computing nodes through the
You are encouraged to fork the repo and make modifications on your own fork.
It is much easier to begin with a small change, such as revising a specific algorithm, adding a new algorithm, and then gradually move on to more complex changes. You are suggested to avoid large changes in a single commit.
If you want to contribute to the repo, please refer to Contributing to GraphScope to get more details.
Building Analytical Engine#
gs command-line utility, you can build analytical engine of GraphScope with a single command.
# Clone a repo if needed # git clone https://github.com/alibaba/graphscope # cd graphscope python3 gsctl.py make analytical
The code of analytical engine is a cmake project, with a
CMakeLists.txt in the its root directory (
/analytical_engine). After the building with
gs, you may found the built artifacts in
Together with the
grape_engine are shared libraries, or there may have a bunch of test binaries if you choose to build the tests.
You could install it to a location by
python3 gsctl.py make analytical-install --install-prefix /usr/local
CMakeLists.txt of analytical engine is in
Take a look at this file if you want to investigate more of the analytical engine or customize the building.
The analytical engine has a suite of tests to ensure the correctness of the code, from unit tests to graph algorithm correctness tests. You could easily test with the new artifacts with a single command:
Here we set the working directory to local repo.
export GRAPHSCOPE_HOME=`pwd` # Here the `pwd` is the root path of GraphScope repository
See more about
GRAPHSCOPE_HOME in run tests
python3 gsctl.py test analytical
It would download the test dataset to the
/tmp/gstest (if not exists) and run multiple algorithms against various graphs, and compare the result with the ground truth.