Skip to Content
ExtensionsGDS (Graph Data Science) Extension

GDS (Graph Data Science) Extension

Since NeuG v0.1.3, we have introduced the GDS extension, which provides a collection of graph algorithms that run on projected subgraphs. It enables common analytical workloads — community detection, centrality analysis, shortest-path computation — directly inside NeuG through the CALL interface.

Quick Start

-- 1. Load the extension LOAD gds; -- 2. Project a subgraph CALL project_graph( 'social', ['person'], {'[person, knows, person]': ''} ); -- 3. Run an algorithm CALL page_rank('social', {max_iterations: 20}) RETURN node.fName, rank; -- 4. Clean up CALL drop_projected_graph('social');

Graph Projection

Before running any GDS algorithm, you must create a projected graph — a named subgraph view that defines which node labels, edge triplets, and optional predicates the algorithm operates on.

project_graph

CALL project_graph( '<graph_name>', <vertex_entries>, <edge_entries> );

Parameters:

ParameterTypeDescription
graph_nameSTRINGA unique alias for the projected subgraph
node_entriesLIST or MAPNode labels, optionally with predicates
edge_entriesMAPEdge triplets [src, edge, dst] mapped to predicates

Node entries can be a simple list of labels or a map with predicates:

-- List form (no predicates) ['person'] -- Map form (with predicates) {'person': 'n.age > 20', 'organisation': 'n.name = "MIT"'}

Edge entries are always a map from triplet to predicate (use empty string '' for no predicate):

{'[person, knows, person]': ''} {'[person, studyat, organisation]': 'r.year > 2010'}

drop_projected_graph

CALL drop_projected_graph('<graph_name>');

Removes a previously projected graph alias.

SHOW_PROJECTED_GRAPHS

CALL SHOW_PROJECTED_GRAPHS() RETURN *;

Lists all currently registered projected graph aliases.

PROJECTED_GRAPH_INFO

CALL PROJECTED_GRAPH_INFO('<graph_name>') RETURN *;

Returns the labels and predicates of a projected graph. Each row contains:

ColumnDescription
labelNode label name or edge triplet (e.g., [person,knows,person])
predicateThe filter predicate string, or empty if none

Algorithms

All algorithms follow the same calling convention:

CALL <algorithm_name>('<projected_graph>', {<options>}) RETURN <columns>;

Every algorithm returns a node column (the matched nodes) plus one or more result columns. The node column is of type NODE, so you can access node properties via node.<property> in the RETURN clause.

Note: Most algorithms (except Label Propagation) require a homogeneous graph subgraph — exactly one node label and one edge triplet where the source and destination labels match the node label.


PageRank

Computes the PageRank centrality score for each node. Higher scores indicate more influential nodes in the graph.

CALL page_rank('<graph_name>', {<options>}) RETURN node, rank;

Options:

OptionTypeDefaultDescription
damping_factorDOUBLE0.85Probability of following a link (vs. random jump)
max_iterationsINT20Maximum number of iterations
directedBOOLfalseWhether to treat edges as directed
concurrencyINTCPU coresNumber of threads for parallel execution

Output columns:

ColumnTypeDescription
nodeNODEThe node
rankDOUBLEPageRank score

Example:

CALL project_graph('social', ['person'], {'[person, knows, person]': ''}); LOAD gds; CALL page_rank('social', {damping_factor: 0.85, max_iterations: 30}) RETURN node.fName, rank ORDER BY rank DESC;

Predicate support: Both node and edge predicates are supported.


Computes the shortest hop distance from a source node to all reachable nodes.

CALL bfs('<graph_name>', {<options>}) RETURN node, distance;

Options:

OptionTypeDefaultDescription
sourceSTRING(required)The source node’s primary key value
directedBOOLfalseWhether to follow edges in their stored direction only
concurrencyINTCPU coresNumber of threads

Output columns:

ColumnTypeDescription
nodeNODEThe node
distanceINT64Hop count from the source node
pathPATHShortest path from source to this node (optional, only when YIELDed)

Example:

CALL bfs('social', {source: '0'}) RETURN node.fName, distance ORDER BY distance;

With path return:

-- Return the actual shortest path CALL bfs('social', {source: '0'}) YIELD node, distance, path RETURN node.fName, distance, path; -- Extract path details CALL bfs('social', {source: '0'}) YIELD node, distance, path RETURN node.fName, distance, nodes(path) AS path_nodes, relationships(path) AS path_edges;

Predicate support: Both node and edge predicates are supported.


SSSP (Single-Source Shortest Path)

Computes the shortest weighted path distance from a source node to all reachable nodes. Without a weight property, it behaves like BFS but returns DOUBLE distances.

CALL sssp('<graph_name>', {<options>}) RETURN node, distance;

Options:

OptionTypeDefaultDescription
sourceSTRING(required)The source node’s primary key value
directedBOOLfalseWhether to follow edges in their stored direction only
weightSTRING""Edge property name to use as weight (empty = unit weight)
concurrencyINTCPU coresNumber of threads

Output columns:

ColumnTypeDescription
nodeNODEThe node
distanceDOUBLEShortest path distance from the source
pathPATHShortest path from source to this node (optional, only when YIELDed)

Example:

CALL sssp('social', {source: '0', weight: 'cost', directed: true}) RETURN node.fName, distance;

With path return:

-- Return the actual shortest path CALL sssp('social', {source: '0', weight: 'cost'}) YIELD node, distance, path RETURN node.fName, distance, path; -- Find path to a specific target CALL sssp('social', {source: '0', weight: 'cost'}) YIELD node, distance, path WHERE node.id = '42' RETURN distance, path;

Predicate support: Both node and edge predicates are supported.


WCC (Weakly Connected Components)

Assigns each node a component ID. Nodes in the same connected component share the same ID.

CALL wcc('<graph_name>', {<options>}) RETURN node, comp;

Options:

OptionTypeDefaultDescription
concurrencyINTCPU coresNumber of threads

Output columns:

ColumnTypeDescription
nodeNODEThe node
compINT64Component identifier

Example:

CALL wcc('social', {concurrency: 8}) RETURN node.fName, comp ORDER BY comp;

Predicate support: Both node and edge predicates are supported.


LCC (Local Clustering Coefficient)

Measures how close a node’s neighbors are to forming a complete graph (clique). Values range from 0.0 to 1.0.

CALL lcc('<graph_name>', {<options>}) RETURN node, lcc;

Options:

OptionTypeDefaultDescription
directedBOOLfalseWhether to compute the directed clustering coefficient
degree_thresholdINTMAX_INTSkip nodes with degree above this threshold
concurrencyINTCPU coresNumber of threads for parallel execution

Output columns:

ColumnTypeDescription
nodeNODEThe node
lccDOUBLELocal clustering coefficient

Example:

CALL lcc('social', {degree_threshold: 1000}) RETURN node.fName, lcc ORDER BY lcc DESC;

Predicate support: Both node and edge predicates are supported.


K-Core Decomposition

Computes the core number for each node. A node has core number k if it belongs to a k-core (a maximal subgraph where every node has degree >= k) but not a (k+1)-core.

CALL kcore('<graph_name>', {<options>}) RETURN node, core;

Options:

OptionTypeDefaultDescription
kINT2Minimum core number threshold (must be >= 0)
concurrencyINTCPU coresNumber of threads

Output columns:

ColumnTypeDescription
nodeNODEThe node
coreINT64Core number of the node

Example:

CALL kcore('social', {k: 3}) RETURN node.fName, core ORDER BY core DESC;

Predicate support: Both node and edge predicates are supported.


CDLP (Community Detection using Label Propagation)

A community detection algorithm that propagates labels through the network. Each node is initially assigned a unique label; in each iteration, every node adopts the most frequent label among its neighbors.

CALL cdlp('<graph_name>', {<options>}) RETURN node, label;

Options:

OptionTypeDefaultDescription
max_iterationsINT5Maximum number of propagation iterations
concurrencyINT1Number of threads for parallel execution

Output columns:

ColumnTypeDescription
nodeNODEThe node
labelINT64Community label assigned to this node

Example:

CALL project_graph( 'study_net', {'person': 'n.age > 20', 'organisation': 'n.name = "MIT"'}, {'[person, studyat, organisation]': 'r.year > 2010'} ); LOAD gds; CALL cdlp('study_net', {concurrency: 10}) RETURN node.id, node.fName, node.name, label;

Predicate support: Both node and edge predicates are supported.

Note: CDLP currently requires a homogeneous graph like other algorithms. Multi-label support is planned for a future release.


Louvain

A community detection algorithm that optimizes modularity by iteratively moving nodes between communities and aggregating the graph into super-nodes.

CALL louvain('<graph_name>', {<options>}) RETURN node, community;

Options:

OptionTypeDefaultDescription
resolutionDOUBLE1.0Resolution parameter (gamma). Values > 1 favor smaller communities, < 1 favor larger communities
directedBOOLfalseWhether to treat the graph as directed
thresholdDOUBLE1e-7Modularity gain threshold for convergence
concurrencyINTCPU coresNumber of threads for parallel execution

Output columns:

ColumnTypeDescription
nodeNODEThe node
communityINT64Community ID (0-based)

Example:

CALL louvain('social', {resolution: 1.0, concurrency: 8}) RETURN node.fName, community ORDER BY community;

Predicate support: Neither node nor edge predicates are supported.


Leiden

A community detection algorithm that improves upon Louvain by adding a refinement phase. This refinement allows communities to be split during execution, leading to better detection of small communities and higher-quality partitions.

CALL leiden('<graph_name>', {<options>}) RETURN node, community;

Options:

OptionTypeDefaultDescription
resolutionDOUBLE1.0Resolution parameter (gamma). Values > 1 favor smaller communities, < 1 favor larger communities
directedBOOLfalseWhether to treat the graph as directed
thresholdDOUBLE1e-7Modularity gain threshold for convergence
concurrencyINTCPU coresNumber of threads for parallel execution

Output columns:

ColumnTypeDescription
nodeNODEThe node
communityINT64Community ID (0-based)

Example:

CALL leiden('social', {resolution: 1.5, concurrency: 8}) RETURN node.fName, community ORDER BY community;

Predicate support: Neither node nor edge predicates are supported.

Leiden vs. Louvain: Use Leiden when you need higher-quality community partitions or better detection of small communities. Use Louvain when you need the fastest possible execution.

Shortest Path Return

BFS and SSSP algorithms support returning the actual shortest path (sequence of nodes and relationships) in addition to the distance. The path is returned as a PATH type that supports all standard Cypher path functions.

Requesting the Path

The path column is optional and only returned when explicitly YIELDed:

-- Without path (default, fastest) CALL bfs('graph', {source: '0'}) RETURN node, distance; -- With path (returns shortest path from source to each node) CALL bfs('graph', {source: '0'}) YIELD node, distance, path RETURN node, distance, path;

Working with Paths

Use standard Cypher path functions to extract information:

-- Get nodes and relationships in the path CALL bfs('graph', {source: '0'}) YIELD node, distance, path RETURN nodes(path) AS path_nodes, relationships(path) AS path_rels, length(path) AS path_length; -- Filter paths by length CALL bfs('graph', {source: '0'}) YIELD node, distance, path WHERE length(path) > 2 RETURN node, distance, path; -- Find specific target CALL sssp('graph', {source: '0', weight: 'cost'}) YIELD node, distance, path WHERE node.id = '42' RETURN distance, path;

Performance Considerations

  • Zero overhead when not YIELDed: If you don’t request the path column, there is no performance penalty compared to distance-only queries.
  • Default is full mode: By default, paths include all vertex and edge properties, matching the behavior of standard MATCH p = ... queries for backward compatibility.
  • Large result sets: When returning paths for many nodes, be aware that path serialization includes all vertex and edge properties, which can be memory-intensive for large graphs.

Algorithm Summary

AlgorithmCALL NameOutput ColumnsKey Options
PageRankpage_ranknode, rankdamping_factor, max_iterations, directed
BFSbfsnode, distance, pathsource (required), directed
SSSPssspnode, distance, pathsource (required), weight, directed
WCCwccnode, compconcurrency
LCClccnode, lccdirected, degree_threshold
K-Corekcorenode, corek
CDLPcdlpnode, labelmax_iterations
Louvainlouvainnode, communityresolution, directed, threshold, concurrency
Leidenleidennode, communityresolution, directed, threshold, concurrency

Note: The path column for BFS and SSSP is optional and only returned when explicitly YIELDed. See the individual algorithm sections for details.

Common Options

All algorithms accept a concurrency option that controls the number of threads used for parallel computation. The default depends on the algorithm:

  • Most algorithms: defaults to the number of CPU cores
  • Label Propagation, Personalized PageRank: defaults to 1

Limitations

  • All algorithms require a homogeneous graph subgraph (exactly one node label and one edge triplet [A, edge, A]). Support for heterogeneous graphs is planned for a future release.
  • Node and edge predicates are supported by all algorithms except Louvain and Leiden.
  • CDLP does not actually support heterogeneous graphs yet — it only processes the first node label and edge triplet. True multi-label support is planned.
Last updated on