GraphScope - graphscope blog

release-note We are glad to announce a suite of upgrades in the latest GraphScope 0.25.0 release, bringing significant improvements to the platform. Starting with this version, our updates will be divided into two parts: one is the updates introduced under the original GraphScope framework (including the graph analytics engine GAE, graph interactive engine GIE, and graph learning engine GLE); the other is the latest product features built for the new GraphScope Flex architecture.

In this release, under the original GraphScope framework, the graph interactive engine (GIE) allows users to express queries in natural language, which are then automatically translated into Cypher; at the same time, the performance of GIE has been improved for certain queries through the optimization of the persistent storage Groot. For graph learning tasks, we have integrated the latest GraphLearn-for-Pytorch (GLTorch) engine, which supports GPU acceleration for graph sampling and feature extraction, thereby enhancing the training and inference performance of graph neural networks. Under the GraphScope Flex architecture, the graph query engine GraphScope Interactive, designed for high-concurrency scenarios, has also made a series of improvements in functionality and user-friendliness.

We highlight the following improvements included in this release:

1. Integration of the Graph Interactive Query Engine GIE with LLMs

To express graph queries, users typically need to use the Cypher or Gremlin languages, which creates a certain usage barrier. With the rapid application of Large Language Models (LLMs) in various industries, we have also tried to leverage the powerful capabilities of LLMs to allow users to express queries in natural language, which are then automatically translated into Cypher and executed on GIE.

from graphscope.langchain_prompt.query import query_to_cypher
question = "Who is the son of Jia Baoyu?"
cypher_sentence = query_to_cypher(graph, question, endpoint=endpoint, api_key=api_key)
print(cypher_sentence)
# MATCH (p:Person)-[:son_of]->(q:Person)
# WHERE p.name = 'Jia Baoyu'
# RETURN q.name

2. Integration with GraphLearn-for-Pytorch (GLTorch)

GLTorch is a PyTorch-based graph neural network framework optimized for scenarios with single-machine multi-GPUs. It uses GPUs to accelerate graph sampling and feature extraction operations in graph neural networks; in addition, its API is compatible with PyG, allowing users to run their graph neural network models originally written with PyG API on GraphScope with minimal code changes.

# load the ogbn_arxiv graph as an example.
g = load_ogbn_arxiv()
# specify the learning engine.
glt_graph = gs.graphlearn_torch(
  g,
  edges=[("paper", "citation", "paper")],
  node_features={"paper": [f"feat_{i}" for i in range(128)]},
  node_labels={"paper": "label"},
  edge_dir="out",
  random_node_split={"num_val": 0.1, "num_test": 0.1},
)

3. Improvements to GraphScope Interactive

GraphScope Interactive is dedicated to providing users with outstanding query processing capabilities in high-concurrency scenarios. In this update, we have made the following improvements:

Added a cache mechanism to avoid repeated compilation of the same query during multiple executions;
Supported the use of string-type attributes as the primary key for vertices;
Added user documentation for GraphScope Interactive.

4. Performance Optimization of Persistent Storage Groot

Optimized the performance of the count() operator for vertices/edges. The performance has been improved when executing queries like g.V().count() in GIE;
Added APIs related to disk usage in Groot, including used disk capacity and remaining available disk capacity.

For more detailed improvements that have been made in this release, please refer to the complete changelog.