GDS (Graph Data Science) Extension
Since NeuG v0.1.3, we have introduced the GDS extension, which provides a collection of graph algorithms that run on projected subgraphs.
It enables common analytical workloads — community detection, centrality analysis, shortest-path computation — directly inside NeuG through the CALL interface.
Quick Start
-- 1. Load the extension
LOAD gds;
-- 2. Project a subgraph
CALL project_graph(
'social',
['person'],
{'[person, knows, person]': ''}
);
-- 3. Run an algorithm
CALL page_rank('social', {max_iterations: 20})
RETURN node.fName, rank;
-- 4. Clean up
CALL drop_projected_graph('social');Graph Projection
Before running any GDS algorithm, you must create a projected graph — a named subgraph view that defines which node labels, edge triplets, and optional predicates the algorithm operates on.
project_graph
CALL project_graph(
'<graph_name>',
<vertex_entries>,
<edge_entries>
);Parameters:
| Parameter | Type | Description |
|---|---|---|
graph_name | STRING | A unique alias for the projected subgraph |
node_entries | LIST or MAP | Node labels, optionally with predicates |
edge_entries | MAP | Edge triplets [src, edge, dst] mapped to predicates |
Node entries can be a simple list of labels or a map with predicates:
-- List form (no predicates)
['person']
-- Map form (with predicates)
{'person': 'n.age > 20', 'organisation': 'n.name = "MIT"'}Edge entries are always a map from triplet to predicate (use empty string ''
for no predicate):
{'[person, knows, person]': ''}
{'[person, studyat, organisation]': 'r.year > 2010'}drop_projected_graph
CALL drop_projected_graph('<graph_name>');Removes a previously projected graph alias.
SHOW_PROJECTED_GRAPHS
CALL SHOW_PROJECTED_GRAPHS() RETURN *;Lists all currently registered projected graph aliases.
PROJECTED_GRAPH_INFO
CALL PROJECTED_GRAPH_INFO('<graph_name>') RETURN *;Returns the labels and predicates of a projected graph. Each row contains:
| Column | Description |
|---|---|
| label | Node label name or edge triplet (e.g., [person,knows,person]) |
| predicate | The filter predicate string, or empty if none |
Algorithms
All algorithms follow the same calling convention:
CALL <algorithm_name>('<projected_graph>', {<options>})
RETURN <columns>;Every algorithm returns a node column (the matched nodes) plus one or more
result columns. The node column is of type NODE, so you can access node
properties via node.<property> in the RETURN clause.
Note: Most algorithms (except Label Propagation) require a homogeneous graph subgraph — exactly one node label and one edge triplet where the source and destination labels match the node label.
PageRank
Computes the PageRank centrality score for each node. Higher scores indicate more influential nodes in the graph.
CALL page_rank('<graph_name>', {<options>})
RETURN node, rank;Options:
| Option | Type | Default | Description |
|---|---|---|---|
damping_factor | DOUBLE | 0.85 | Probability of following a link (vs. random jump) |
max_iterations | INT | 20 | Maximum number of iterations |
directed | BOOL | false | Whether to treat edges as directed |
concurrency | INT | CPU cores | Number of threads for parallel execution |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
rank | DOUBLE | PageRank score |
Example:
CALL project_graph('social', ['person'], {'[person, knows, person]': ''});
LOAD gds;
CALL page_rank('social', {damping_factor: 0.85, max_iterations: 30})
RETURN node.fName, rank
ORDER BY rank DESC;Predicate support: Both node and edge predicates are supported.
BFS (Breadth-First Search)
Computes the shortest hop distance from a source node to all reachable nodes.
CALL bfs('<graph_name>', {<options>})
RETURN node, distance;Options:
| Option | Type | Default | Description |
|---|---|---|---|
source | STRING | (required) | The source node’s primary key value |
directed | BOOL | false | Whether to follow edges in their stored direction only |
concurrency | INT | CPU cores | Number of threads |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
distance | INT64 | Hop count from the source node |
path | PATH | Shortest path from source to this node (optional, only when YIELDed) |
Example:
CALL bfs('social', {source: '0'})
RETURN node.fName, distance
ORDER BY distance;With path return:
-- Return the actual shortest path
CALL bfs('social', {source: '0'})
YIELD node, distance, path
RETURN node.fName, distance, path;
-- Extract path details
CALL bfs('social', {source: '0'})
YIELD node, distance, path
RETURN node.fName, distance,
nodes(path) AS path_nodes,
relationships(path) AS path_edges;Predicate support: Both node and edge predicates are supported.
SSSP (Single-Source Shortest Path)
Computes the shortest weighted path distance from a source node to all
reachable nodes. Without a weight property, it behaves like BFS but returns
DOUBLE distances.
CALL sssp('<graph_name>', {<options>})
RETURN node, distance;Options:
| Option | Type | Default | Description |
|---|---|---|---|
source | STRING | (required) | The source node’s primary key value |
directed | BOOL | false | Whether to follow edges in their stored direction only |
weight | STRING | "" | Edge property name to use as weight (empty = unit weight) |
concurrency | INT | CPU cores | Number of threads |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
distance | DOUBLE | Shortest path distance from the source |
path | PATH | Shortest path from source to this node (optional, only when YIELDed) |
Example:
CALL sssp('social', {source: '0', weight: 'cost', directed: true})
RETURN node.fName, distance;With path return:
-- Return the actual shortest path
CALL sssp('social', {source: '0', weight: 'cost'})
YIELD node, distance, path
RETURN node.fName, distance, path;
-- Find path to a specific target
CALL sssp('social', {source: '0', weight: 'cost'})
YIELD node, distance, path
WHERE node.id = '42'
RETURN distance, path;Predicate support: Both node and edge predicates are supported.
WCC (Weakly Connected Components)
Assigns each node a component ID. Nodes in the same connected component share the same ID.
CALL wcc('<graph_name>', {<options>})
RETURN node, comp;Options:
| Option | Type | Default | Description |
|---|---|---|---|
concurrency | INT | CPU cores | Number of threads |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
comp | INT64 | Component identifier |
Example:
CALL wcc('social', {concurrency: 8})
RETURN node.fName, comp
ORDER BY comp;Predicate support: Both node and edge predicates are supported.
LCC (Local Clustering Coefficient)
Measures how close a node’s neighbors are to forming a complete graph (clique). Values range from 0.0 to 1.0.
CALL lcc('<graph_name>', {<options>})
RETURN node, lcc;Options:
| Option | Type | Default | Description |
|---|---|---|---|
directed | BOOL | false | Whether to compute the directed clustering coefficient |
degree_threshold | INT | MAX_INT | Skip nodes with degree above this threshold |
concurrency | INT | CPU cores | Number of threads for parallel execution |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
lcc | DOUBLE | Local clustering coefficient |
Example:
CALL lcc('social', {degree_threshold: 1000})
RETURN node.fName, lcc
ORDER BY lcc DESC;Predicate support: Both node and edge predicates are supported.
K-Core Decomposition
Computes the core number for each node. A node has core number k if it belongs to a k-core (a maximal subgraph where every node has degree >= k) but not a (k+1)-core.
CALL kcore('<graph_name>', {<options>})
RETURN node, core;Options:
| Option | Type | Default | Description |
|---|---|---|---|
k | INT | 2 | Minimum core number threshold (must be >= 0) |
concurrency | INT | CPU cores | Number of threads |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
core | INT64 | Core number of the node |
Example:
CALL kcore('social', {k: 3})
RETURN node.fName, core
ORDER BY core DESC;Predicate support: Both node and edge predicates are supported.
CDLP (Community Detection using Label Propagation)
A community detection algorithm that propagates labels through the network. Each node is initially assigned a unique label; in each iteration, every node adopts the most frequent label among its neighbors.
CALL cdlp('<graph_name>', {<options>})
RETURN node, label;Options:
| Option | Type | Default | Description |
|---|---|---|---|
max_iterations | INT | 5 | Maximum number of propagation iterations |
concurrency | INT | 1 | Number of threads for parallel execution |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
label | INT64 | Community label assigned to this node |
Example:
CALL project_graph(
'study_net',
{'person': 'n.age > 20', 'organisation': 'n.name = "MIT"'},
{'[person, studyat, organisation]': 'r.year > 2010'}
);
LOAD gds;
CALL cdlp('study_net', {concurrency: 10})
RETURN node.id, node.fName, node.name, label;Predicate support: Both node and edge predicates are supported.
Note: CDLP currently requires a homogeneous graph like other algorithms. Multi-label support is planned for a future release.
Louvain
A community detection algorithm that optimizes modularity by iteratively moving nodes between communities and aggregating the graph into super-nodes.
CALL louvain('<graph_name>', {<options>})
RETURN node, community;Options:
| Option | Type | Default | Description |
|---|---|---|---|
resolution | DOUBLE | 1.0 | Resolution parameter (gamma). Values > 1 favor smaller communities, < 1 favor larger communities |
directed | BOOL | false | Whether to treat the graph as directed |
threshold | DOUBLE | 1e-7 | Modularity gain threshold for convergence |
concurrency | INT | CPU cores | Number of threads for parallel execution |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
community | INT64 | Community ID (0-based) |
Example:
CALL louvain('social', {resolution: 1.0, concurrency: 8})
RETURN node.fName, community
ORDER BY community;Predicate support: Neither node nor edge predicates are supported.
Leiden
A community detection algorithm that improves upon Louvain by adding a refinement phase. This refinement allows communities to be split during execution, leading to better detection of small communities and higher-quality partitions.
CALL leiden('<graph_name>', {<options>})
RETURN node, community;Options:
| Option | Type | Default | Description |
|---|---|---|---|
resolution | DOUBLE | 1.0 | Resolution parameter (gamma). Values > 1 favor smaller communities, < 1 favor larger communities |
directed | BOOL | false | Whether to treat the graph as directed |
threshold | DOUBLE | 1e-7 | Modularity gain threshold for convergence |
concurrency | INT | CPU cores | Number of threads for parallel execution |
Output columns:
| Column | Type | Description |
|---|---|---|
node | NODE | The node |
community | INT64 | Community ID (0-based) |
Example:
CALL leiden('social', {resolution: 1.5, concurrency: 8})
RETURN node.fName, community
ORDER BY community;Predicate support: Neither node nor edge predicates are supported.
Leiden vs. Louvain: Use Leiden when you need higher-quality community partitions or better detection of small communities. Use Louvain when you need the fastest possible execution.
Shortest Path Return
BFS and SSSP algorithms support returning the actual shortest path (sequence of nodes
and relationships) in addition to the distance. The path is returned as a PATH type
that supports all standard Cypher path functions.
Requesting the Path
The path column is optional and only returned when explicitly YIELDed:
-- Without path (default, fastest)
CALL bfs('graph', {source: '0'})
RETURN node, distance;
-- With path (returns shortest path from source to each node)
CALL bfs('graph', {source: '0'})
YIELD node, distance, path
RETURN node, distance, path;Working with Paths
Use standard Cypher path functions to extract information:
-- Get nodes and relationships in the path
CALL bfs('graph', {source: '0'})
YIELD node, distance, path
RETURN nodes(path) AS path_nodes,
relationships(path) AS path_rels,
length(path) AS path_length;
-- Filter paths by length
CALL bfs('graph', {source: '0'})
YIELD node, distance, path
WHERE length(path) > 2
RETURN node, distance, path;
-- Find specific target
CALL sssp('graph', {source: '0', weight: 'cost'})
YIELD node, distance, path
WHERE node.id = '42'
RETURN distance, path;Performance Considerations
- Zero overhead when not YIELDed: If you don’t request the
pathcolumn, there is no performance penalty compared to distance-only queries. - Default is full mode: By default, paths include all vertex and edge properties,
matching the behavior of standard
MATCH p = ...queries for backward compatibility. - Large result sets: When returning paths for many nodes, be aware that path serialization includes all vertex and edge properties, which can be memory-intensive for large graphs.
Algorithm Summary
| Algorithm | CALL Name | Output Columns | Key Options |
|---|---|---|---|
| PageRank | page_rank | node, rank | damping_factor, max_iterations, directed |
| BFS | bfs | node, distance, path | source (required), directed |
| SSSP | sssp | node, distance, path | source (required), weight, directed |
| WCC | wcc | node, comp | concurrency |
| LCC | lcc | node, lcc | directed, degree_threshold |
| K-Core | kcore | node, core | k |
| CDLP | cdlp | node, label | max_iterations |
| Louvain | louvain | node, community | resolution, directed, threshold, concurrency |
| Leiden | leiden | node, community | resolution, directed, threshold, concurrency |
Note: The path column for BFS and SSSP is optional and only returned when explicitly YIELDed. See the individual algorithm sections for details.
Common Options
All algorithms accept a concurrency option that controls the number of
threads used for parallel computation. The default depends on the algorithm:
- Most algorithms: defaults to the number of CPU cores
- Label Propagation, Personalized PageRank: defaults to
1
Limitations
- All algorithms require a homogeneous graph subgraph (exactly one node
label and one edge triplet
[A, edge, A]). Support for heterogeneous graphs is planned for a future release. - Node and edge predicates are supported by all algorithms except Louvain and Leiden.
- CDLP does not actually support heterogeneous graphs yet — it only processes the first node label and edge triplet. True multi-label support is planned.