Import and Export Graph Data of Neo4j with GraphAr

title-picture GraphAr is an open source, standard data file format for graph data storage and retrieval. It defines a standardized file format for graph data, and provides a set of interfaces for generating, accessing, and transforming these formatted files. This post is a quick guide that shows how to import and export graph data of Neo4j with GraphAr.

What is GraphAr?

GraphAr (Graph Archive, abbreviated as GAR) defines a standardized, system-independent file format for graph data and provides a set of interfaces for generating, accessing, and converting these formatted files. GraphAr can help various graph computing applications or existing systems to conveniently build and access graph data. It can be used as a direct data source for graph computing applications, as well as for importing/exporting and persistently storing graph data, reducing the overhead of collaboration between various graph systems. The following figure shows the scenario of using GraphAr as a graph data archiving format and data source for graph computing applications: with GraphAr, users can quickly and easily import/export graph data from/to graph databases, such as Neo4j, and use GraphAr as a data source for graph computing applications, such as GraphScope.

graphar-exchange

GraphAr Spark SDK

GraphAr Spark SDK uses maven as a package build system and requires Java 8 or higher. We provide a script to build GraphAr Spark SDK. To build GraphAr Spark SDK, run the following command:

# download GraphAr Spark SDK source code
git clone https://github.com/alibaba/GraphAr.git
cd GraphAr
cd spark

# build GraphAr Spark SDK
./scripts/build.sh

Export/Import Graph Data of Neo4j with GraphAr

Neo4j is a popular graph database system and it provide Neo4j Spark Connector tool to import/export graph data between Neo4j and Spark. GraphAr Spark SDK can be used as a data source for Neo4j Spark Connector to import/export graph data between Neo4j and GraphAr.

To demonstrate how to export graph data of Neo4j to GraphAr, we use movie graph data of Neo4j as an example to show how to export graph data of Neo4j with GraphAr. The following figure shows the movie graph data of Neo4j:

movie-example

Deploy Neo4j

Before exporting graph data of Neo4j with GraphAr, we need to deploy Neo4j. Here we provide a script to deploy Neo4j to HOME directory. But If you already have a Neo4j instance, you can skip this step. To deploy Neo4j, run the following command:

./scripts/get-neo4j-to-home.sh
export NEO4J_HOME="${HOME}/neo4j-community-4.4.23"
export PATH="${NEO4J_HOME}/bin:${PATH}"

Set the initial password of Neo4j:

neo4j-admin set-initial-password xxxx # set your password here

Load Movie Graph Data to Neo4j

Neo4j provides a movie graph data example in their dateset. We can load this movie graph data to Neo4j. To load movie graph data to Neo4j, run the following command:

./scripts/deploy-neo4j-movie-data.sh

After loading movie graph data to Neo4j, we can use Neo4j Browser to check the movie graph data. The username is neo4j and the password is the one you set in the previous step. Open the Neo4j browser to check the movie graph data. The following figure shows the movie graph data of Neo4j:

neo4j-browser

Export Graph Data of Neo4j with GraphAr

GraphAr provides a Neo4j2GraphAr example class to export movie graph data of Neo4j to GraphAr. The following code shows how to export graph data of Neo4j to GraphAr:

object Neo4j2GraphAr {
  def main(args: Array[String]): Unit = {
    // connect to the Neo4j instance
    val spark = SparkSession
      .builder()
      .appName("Neo4j to GraphAr for Movie Graph")
      .config("neo4j.url", "bolt://localhost:7687")
      .config("neo4j.authentication.type", "basic")
      .config(
        "neo4j.authentication.basic.username",
        sys.env.get("NEO4J_USR").get
      )
      .config(
        "neo4j.authentication.basic.password",
        sys.env.get("NEO4J_PWD").get
      )
      .config("spark.master", "local")
      .getOrCreate()

    // initialize a graph writer
    val writer: GraphWriter = new GraphWriter()

    // put movie graph data into writer
    readAndPutDataIntoWriter(writer, spark)

    // output directory
    val outputPath: String = args(0)
    // vertex chunk size
    val vertexChunkSize: Long = args(1).toLong
    // edge chunk size
    val edgeChunkSize: Long = args(2).toLong
    // file type
    val fileType: String = args(3)

    // write in graphar format
    writer.write(
      outputPath,
      spark,
      "MovieGraph",
      vertexChunkSize,
      edgeChunkSize,
      fileType
    )
  }
}

The code above shows how to export graph data of Neo4j to GraphAr. The readAndPutDataIntoWriter method is used to read graph data of Neo4j and put the graph data into a GraphWriter instance. The detail of the implementation of readAndPutDataIntoWriter method can be found in the Neo4j2GraphAr.scala file.

To run the Neo4j2GraphAr example, just run the following command:

export NEO4J_USR="neo4j"
export NEO4J_PWD="xxxx" # the password you set in the previous step
./scripts/run-neo4j2graphar.sh

The example will convert the movie data in Neo4j to GraphAr data and save it to the directory /tmp/graphar/neo4j2graphar.

Import Graph Data of Neo4j with GraphAr

In the same way, GraphAr provides a GraphAr2Neo4j example class to import movie graph data of GraphAr to Neo4j. The following code shows how to import graph data of GraphAr to Neo4j:

object GraphAr2Neo4j {

  def main(args: Array[String]): Unit = {
    // connect to the Neo4j instance
    val spark = SparkSession
      .builder()
      .appName("GraphAr to Neo4j for Movie Graph")
      .config("neo4j.url", "bolt://localhost:7687")
      .config("neo4j.authentication.type", "basic")
      .config(
        "neo4j.authentication.basic.username",
        sys.env.get("NEO4J_USR").get
      )
      .config(
        "neo4j.authentication.basic.password",
        sys.env.get("NEO4J_PWD").get
      )
      .config("spark.master", "local")
      .getOrCreate()

    // path to the graph information file
    val graphInfoPath: String = args(0)
    val graphInfo = GraphInfo.loadGraphInfo(graphInfoPath, spark)

    // The edge data need to convert src and dst to the vertex id , so we need to read
    // the vertex data with index column.
    val graphData = GraphReader.read(graphInfoPath, spark, true)
    val vertexData = graphData._1
    val edgeData = graphData._2

    putVertexDataIntoNeo4j(graphInfo, vertexData, spark)
    putEdgeDataIntoNeo4j(graphInfo, vertexData, edgeData, spark)
  }
}

The code above shows how to import graph data of GraphAr to Neo4j. The putVertexDataIntoNeo4j method is used to put vertex data of GraphAr into Neo4j. The putEdgeDataIntoNeo4j method is used to put edge data of GraphAr into Neo4j. The detail of the implementation of putVertexDataIntoNeo4j and putEdgeDataIntoNeo4j methods can be found in the GraphAr2Neo4j.scala file.

To run the GraphAr2Neo4j example, just run the following command:

./scripts/run-graphar2neo4j.sh

Now we have successfully imported the movie graph data of GraphAr to Neo4j. We can use Neo4j Browser to check the movie graph data like the previous step.

Conclusion

GraphAr define a simple and standard data file format for graph data storage and retrieval. It provides a set of interfaces for generating, accessing, and converting these formatted files. GraphAr can help various graph computing applications or existing systems to conveniently build and access graph data. It can be used as a direct data source for graph computing applications, as well as for importing/exporting and persistently storing graph data, reducing the overhead of collaboration between various graph systems. In this post, we show how to import and export graph data of Neo4j with GraphAr. There are many other examples in GraphAr Spark SDK. Please refer to more examples to learn about the other available case studies utilizing GraphAr Spark SDK.