Supported Gremlin Steps#
Introduction#
This documentation guides you how to work with the gremlin graph traversal language in GraphScope. On the one hand we retain the original syntax of most steps from the standard gremlin, on the other hand the usages of some steps are further extended to denote more complex situations in real-world scenarios.
Standard Steps#
We retain the original syntax of the following steps from the standard gremlin.
Source#
Expand#
outE()#
Map the vertex to its outgoing incident edges given the edge labels.
Parameters: edgeLabels - the edge labels to traverse.
g.V().outE("knows")
g.V().outE("knows", "created")
inE()#
Map the vertex to its incoming incident edges given the edge labels.
Parameters: edgeLabels - the edge labels to traverse.
g.V().inE("knows")
g.V().inE("knows", "created")
bothE()#
Map the vertex to its incident edges given the edge labels.
Parameters: edgeLabels - the edge labels to traverse.
g.V().bothE("knows")
g.V().bothE("knows", "created")
out()#
Map the vertex to its outgoing adjacent vertices given the edge labels.
Parameters: edgeLabels - the edge labels to traverse.
g.V().out("knows")
g.V().out("knows", "created")
in()#
Map the vertex to its incoming adjacent vertices given the edge labels.
Parameters: edgeLabels - the edge labels to traverse.
g.V().in("knows")
g.V().in("knows", "created")
both()#
Map the vertex to its adjacent vertices given the edge labels.
Parameters: edgeLabels - the edge labels to traverse.
g.V().both("knows")
g.V().both("knows", "created")
otherV()#
Map the edge to the incident vertex that was not just traversed from in the path history.
g.V().bothE().otherV() # = g.V().both()
bothV()#
Map the edge to its incident vertices.
g.V().outE().bothV() # both endpoints of the outgoing edges
Filter#
hasId()#
The hasId()-step is meant to filter graph elements based on their identifiers.
Parameters: elementIds - identifiers of the elements.
g.V().hasId(1) # = g.V(1)
g.V().hasId(1,2,3) # = g.V(1,2,3)
hasLabel()#
The hasLabel()-step is meant to filter graph elements based on their labels.
Parameters: labels - labels of the elements.
g.V().hasLabel("person")
g.V().hasLabel("person", "software")
has()#
The has()-step is meant to filter graph elements by applying predicates on their properties.
Parameters:
propertyKey - the key of the property to filter on for existence.
g.V().has("name") # find vertices containing property `name`
propertyKey - the key of the property to filter on, value - the value to compare the accessor value to for equality.
g.V().has("age", 10) g.V().has("name", "marko") g.E().has("weight", 1.0)
propertyKey - the key of the property to filter on, predicate - the filter to apply to the key’s value.
g.V().has("age", P.eq(10)) g.V().has("age", P.neq(10)) g.V().has("age", P.gt(10)) g.V().has("age", P.lt(10)) g.V().has("age", P.gte(10)) g.V().has("age", P.lte(10)) g.V().has("age", P.within([10, 20])) g.V().has("age", P.without([10, 20])) g.V().has("age", P.inside(10, 20)) g.V().has("age", P.outside(10, 20)) g.V().has("age", P.not(P.eq(10))) # = g.V().has("age", P.neq(10)) g.V().has("name", TextP.startingWith("mar")) g.V().has("name", TextP.endingWith("rko")) g.V().has("name", TextP.containing("ark")) g.V().has("name", TextP.notStartingWith("mar")) g.V().has("name", TextP.notEndingWith("rko")) g.V().has("name", TextP.notContaining("ark"))
label - the label of the Element, propertyKey - the key of the property to filter on, value - the value to compare the accessor value to for equality.
g.V().has("person", "id", 1) # = g.V().hasLabel("person").has("id", 1)
label - the label of the Element, propertyKey - the key of the property to filter on, predicate - the filter to apply to the key’s value.
g.V().has("person", "age", P.eq(10)) # = g.V().hasLabel("person").has("age", P.eq(10))
hasNot()#
The hasNot()-step is meant to filter graph elements based on the non-existence of properties.
Parameters: propertyKey - the key of the property to filter on for non-existence.
g.V().hasNot("age") # find vertices not-containing property `age`
is()#
The is()-step is meant to filter the object if it is unequal to the provided value or fails the provided predicate.
Parameters:
value - the value that the object must equal.
g.V().out().count().is(1)
predicate - the filter to apply.
g.V().out().count().is(P.eq(1))
where(traversal)#
The where(traversal)-step is meant to filter the current object by applying it to the nested traversal.
Parameters: whereTraversal - the traversal to apply.
g.V().where(out().count())
g.V().where(out().count().is(gt(0)))
where(predicate)#
The where(predicate)-step is meant to filter the traverser based on the predicate acting on different tags.
Parameters:
predicate - the predicate containing another tag to apply.
# is the current entry equal to the entry referred by `a`? g.V().as("a").out().out().where(P.eq("a"))
startKey - the tag containing the object to filter, predicate - the predicate containing another tag to apply.
# is the entry referred by `b` equal to the entry referred by `a`? g.V().as("a").out().out().as("b").where("b", P.eq("a"))
The by() can be applied to a number of different steps to alter their behaviors. Here are some usages of the modulated by()-step after a where-step:
empty - this form is essentially an identity() modulation.
# = g.V().as("a").out().out().as("b").where("b", P.eq("a")) g.V().as("a").out().out().as("b").where("b", P.eq("a")).by()
propertyKey - filter by the property value of the specified tag given the property key.
# whether entry `b` and entry `a` have the same property value of `name`? g.V().as("a").out().out().as("b").where("b", P.eq("a")).by("name")
traversal - filter by the computed value after applying the specified tag to the nested traversal.
# whether entry `b` and entry `a` have the same count of one-hop neighbors? g.V().as("a").out().out().as("b").where("b", P.eq("a")).by(out().count())
not(traversal)#
The not()-step is opposite to the where()-step and removes objects from the traversal stream when the traversal provided as an argument does not return any objects.
Parameters: notTraversal - the traversal to filter by.
g.V().not(out().count())
g.V().not(out().count().is(gt(0)))
dedup()#
Remove all duplicates in the traversal stream up to this point.
Parameters: dedupLabels - composition of the given labels determines de-duplication. No labels implies current object.
g.V().dedup()
g.V().as("a").out().dedup("a") # dedup by entry `a`
g.V().as("a").out().as("b").dedup("a", "b") # dedup by the composition of entry `a` and `b`
Usages of the modulated by()-step:
propertyKey - dedup by the property value of the current object or the specified tag given the property key.
# dedup by the property value of `name` of the current entry g.V().dedup().by("name") # dedup by the property value of `name` of the entry `a` g.V().as("a").out().dedup("a").by("name")
token - dedup by the token value of the current object or the specified tag.
g.V().dedup().by(T.id) g.V().dedup().by(T.label) g.V().as("a").out().dedup("a").by(T.id) g.V().as("a").out().dedup("a").by(T.label)
traversal - dedup by the computed value after applying the current object or the specified tag to the nested traversal.
g.V().dedup().by(out().count()) g.V().as("a").out().dedup("a").by(out().count())
Project#
constant()#
The constant()-step is meant to map any object to a fixed object value.
Parameters: value - a fixed object value.
g.V().constant(1)
g.V().constant("marko")
g.V().constant(1.0)
valueMap()#
The valueMap()-step is meant to map the graph element to a map of the property entries according to their actual properties. If no property keys are provided, then all property values are retrieved.
Parameters: propertyKeys - the properties to retrieve.
g.V().valueMap()
g.V().valueMap("name")
g.V().valueMap("name", "age")
values()#
The values()-step is meant to map the graph element to the values of the associated properties given the provide property keys. Here we just allow only one property key as the argument to the values()
to implement the step as a map instead of a flat-map, which may be a little different from the standard gremlin.
Parameters: propertyKey - the property to retrieve its value from.
g.V().values("name")
select()#
The select()-step is meant to map the traverser to the object specified by the selectKey or to a map projection of sideEffect values.
Parameters: selectKeys - the keys to project.
g.V().as("a").select("a")
g.V().as("a").out().as("b").select("a", "b")
Usages of the modulated by()-step:
empty - an identity() modulation.
# = g.V().as("a").select("a") g.V().as("a").select("a").by() # = g.V().as("a").out().as("b").select("a", "b") g.V().as("a").out().as("b").select("a", "b").by().by()
token - project the token value of the specified tag.
g.V().as("a").select("a").by(T.id) g.V().as("a").select("a").by(T.label)
propertyKey - project the property value of the specified tag given the property key.
g.V().as("a").select("a").by("name")
traversal - project the computed value after applying the specified tag to the nested traversal.
g.V().as("a").select("a").by(valueMap("name", "id")) g.V().as("a").select("a").by(out().count())
Aggregate#
fold()#
Rolls up objects in the stream into an aggregate list.
# select top-10 vertices from the stream and fold them into single list
g.V().limit(10).fold()
group()#
Organize objects in the stream into a Map. Calls to group() are typically accompanied with by() modulators which help specify how the grouping should occur.
Usages of the key by()-step:
empty - group the elements in the stream by the current value.
g.V().group().by() # = g.V().group()
propertyKey - group the elements in the stream by the property value of the current object given the property key.
g.V().group().by("name")
traversal - group the elements in the stream by the computed value after applying the current object to the nested traversal.
g.V().group().by(values("name")) # = g.V().group().by("name") g.V().group().by(out().count())
Usages of the value by()-step:
empty - fold elements in each group into a list, which is a default behavior.
g.V().group().by().by() # = g.V().group()
propertyKey - for each element in the group, get their property values according to the given keys.
g.V().group().by().by("name")
aggregateFunc - aggregate function to apply in each group.
g.V().group().by().by(count()) g.V().group().by().by(fold()) # get the property values of `name` of the vertices in each group list g.V().group().by().by(values("name").fold()) # = g.V().group().by().by("name") # sum the property values of `age` in each group g.V().group().by().by(values("age").sum()) # find the minimum value of `age` in each group g.V().group().by().by(values("age").min()) # find the maximum value of `age` in each group g.V().group().by().by(values("age").max()) # calculate the average value of `age` in each group g.V().group().by().by(values("age").mean()) # count the number of distinct elements in each group g.V().group().by().by(dedup().count()) # de-duplicate in each group list g.V().group().by().by(dedup().fold())
groupCount()#
Counts the number of times a particular objects has been part of a traversal, returning a map where the object is the key and the value is the count.
Usages of the key by()-step:
empty - group the elements in the stream by the current value.
g.V().groupCount().by() # = g.V().groupCount()
propertyKey - group the elements in the stream by the property value of the current object given the property key.
g.V().groupCount().by("name")
traversal - group the elements in the stream by the computed value after applying the current object to the nested traversal.
g.V().groupCount().by(values("name")) # = g.V().groupCount().by("name") g.V().groupCount().by(out().count())
Order#
order()#
Order all the objects in the traversal up to this point and then emit them one-by-one in their ordered sequence.
Usages of the modulated by()-step:
empty - order by the current object in ascending order, which is a default behavior.
g.V().order().by() # = g.V().order()
order - the comparator to apply typically for some order (asc | desc | shuffle).
g.V().order().by(Order.asc) # = g.V().order() g.V().order().by(Order.desc)
propertyKey - order by the property value of the current object given the property key.
g.V().order().by("name") # default order is asc g.V().order().by("age")
traversal - order by the computed value after applying the current object to the nested traversal.
g.V().order().by(out().count()) # default order is asc
propertyKey - order by the property value of the current object given the property key, order - the comparator to apply typically for some order.
g.V().order().by("name", Order.desc)
traversal - order by the computed value after applying the current object to the nested traversal, order - the comparator to apply typically for some order.
g.V().order().by(out().count(), Order.desc)
Statistics#
Union#
Match#
match()#
The match()-step provides a declarative form of graph patterns to match with. With match(), the user provides a collection of “sentences,” called patterns, that have variables defined that must hold true throughout the duration of the match(). For most of the complex graph patterns, it is usually much easier to express via match() than with single-path traversals.
Parameters: matchSentences - define a collection of patterns. Each pattern consists of a start tag, a serials of gremlin steps (binders) and an end tag.
Supported binders within a pattern:
Expand: in()/out()/both(), inE()/outE()/bothE(), inV()/outV()/otherV/bothV
PathExpand
Filter: has()/not()/where
g.V().match(__.as("a").out().as("b"), __.as("b").out().as("c"))
g.V().match(__.as("a").out().out().as("b"), where(__.as("a").out().as("b")))
g.V().match(__.as("a").out().out().as("b"), not(__.as("a").out().as("b")))
g.V().match(__.as("a").out().has("name", "marko").as("b"), __.as("b").out().as("c"))
Subgraph#
subgraph()#
An edge-induced subgraph extracted from the original graph.
Parameters: graphName - the name of the side-effect key that will hold the subgraph.
g.E().subgraph("all")
g.V().has('name', "marko").outE("knows").subgraph("partial")
Syntactic Sugars#
The following steps are extended to denote more complex situations.
PathExpand#
In Graph querying, expanding a multiple-hops path from a starting point is called PathExpand
, which is commonly used in graph scenarios. In addition, there are different requirements for expanding strategies in different scenarios, i.e. it is required to output a simple path or all vertices explored along the expanding path. We introduce the with()-step to configure the corresponding behaviors of the PathExpand
-step.
out()#
Expand a multiple-hops path along the outgoing edges, which length is within the given range.
Parameters: lengthRange - the lower and the upper bounds of the path length, edgeLabels - the edge labels to traverse.
Usages of the with()-step:
keyValuePair - the options to configure the corresponding behaviors of the PathExpand
-step.
# expand hops within the range of [1, 10) along the outgoing edges,
# vertices can be duplicated and only the end vertex should be kept
g.V().out("1..10").with('PATH_OPT', 'ARBITRARY').with('RESULT_OPT', 'END_V')
# expand hops within the range of [1, 10) along the outgoing edges,
# vertices can not be duplicated and all vertices should be kept
g.V().out("1..10").with('PATH_OPT', 'SIMPLE').with('RESULT_OPT', 'ALL_V')
# = g.V().out("1..10").with('PATH_OPT', 'ARBITRARY').with('RESULT_OPT', 'END_V')
g.V().out("1..10")
# expand hops within the range of [1, 10) along the outgoing edges which label is `knows`,
# vertices can be duplicated and only the end vertex should be kept
g.V().out("1..10", "knows")
# expand hops within the range of [1, 10) along the outgoing edges which label is `knows` or `created`,
# vertices can be duplicated and only the end vertex should be kept
g.V().out("1..10", "knows", "created")
Running Example:
gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'ALL_V')
==>[v[1], v[2]]
==>[v[1], v[4]]
gremlin> g.V().out("1..3", "knows").with('RESULT_OPT', 'END_V').endV()
==>v[2]
==>v[4]
in()#
Expand a multiple-hops path along the incoming edges, which length is within the given range.
g.V().in("1..10").with('PATH_OPT', 'ARBITRARY').with('RESULT_OPT', 'END_V')
Running Example:
gremlin> g.V().in("1..3", "knows").with('RESULT_OPT', 'ALL_V')
==>[v[2], v[1]]
==>[v[4], v[1]]
gremlin> g.V().in("1..3", "knows").with('RESULT_OPT', 'END_V').endV()
==>v[1]
==>v[1]
both()#
Expand a multiple-hops path along the incident edges, which length is within the given range.
g.V().both("1..10").with('PATH_OPT', 'ARBITRARY').with('RESULT_OPT', 'END_V')
Running Example:
gremlin> g.V().both("1..3", "knows").with('RESULT_OPT', 'ALL_V')
==>[v[2], v[1]]
==>[v[1], v[2]]
==>[v[1], v[4]]
==>[v[2], v[1], v[2]]
==>[v[2], v[1], v[4]]
==>[v[4], v[1]]
==>[v[1], v[2], v[1]]
==>[v[1], v[4], v[1]]
==>[v[4], v[1], v[2]]
==>[v[4], v[1], v[4]]
gremlin> g.V().both("1..3", "knows").with('RESULT_OPT', 'END_V').endV()
==>v[1]
==>v[1]
==>v[2]
==>v[4]
==>v[2]
==>v[1]
==>v[1]
==>v[4]
==>v[2]
==>v[4]
endV()#
By default, all kept vertices are stored in a path collection which can be unfolded by a endV()
-step.
# a path collection containing the vertices within [1, 10) hops
g.V().out("1..10").with('RESULT_OPT', 'ALL_V')
# unfold vertices in the path collection
g.V().out("1..10").with('RESULT_OPT', 'ALL_V').endV()
Expression#
Expression is introduced to denote property-based calculations or filters, which consists of the following basic entries:
@ # the value of the current entry
@.name # the property value of `name` of the current entry
@a # the value of the entry `a`
@a.name # the property value of `name` of the entry `a`
And related operations can be performed based on these entries, including:
arithmetic
@.age + 10 @.age * 10 (@.age + 4) / 10 + (@.age - 5)
logic comparison
@.name == "marko" @.age != 10 @.age > 10 @.age < 10 @.age >= 10 @.weight <= 10.0
logic connector
@.age > 10 && @.age < 20 @.age < 10 || @.age > 20
bit manipulation
@.num | 2 @.num & 2 @.num ^ 2 @.num >> 2 @.num << 2
exponentiation
@.num ^^ 3 @.num ^^ -3
Expression(s) in project or filter:
filter: where(expr(“…”))
g.V().where(expr("@.name == \"marko\"")) # = g.V().has("name", "marko") g.V().where(expr("@.age > 10")) # = g.V().has("age", P.gt(10)) g.V().as("a").out().where(expr("@.name == \"marko\" || (@a.age > 10)"))
project: select(expr(“…”))
g.V().select(expr("@.name")) # = g.V().values("name")
Running Example:
gremlin> g.V().where(expr("@.name == \"marko\""))
==>v[1]
gremlin> g.V().as("a").where(expr("@a.name == \"marko\" || (@a.age > 10)"))
==>v[2]
==>v[1]
==>v[4]
==>v[6]
gremlin> g.V().select(expr("@.name"))
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
Aggregate (Group)#
The group()-step in standard gremlin has limited capabilities (i.e. grouping can only be performed based on a single key, and only one aggregate calculation can be applied in each group), which cannot be applied to the requirements of performing group calculations on multiple keys or values; Therefore, we further extend the capabilities of the group()-step, allowing multiple variables to be set and different aliases to be configured in key by()-step and value by()-step respectively.
Usages of the key by()-step:
# group by the property values of `name` and `age` of the current entry
group().by(values("name").as("k1"), values("age").as("k2"))
# group by the count of one-hop neighbors and the property value of `age` of the current entry
group().by(out().count().as("k1"), values("name").as("k2"))
Usages of the value by()-step:
# calculate the count of vertices and the sum of `age` respectively in each group
group().by("name").by(count().as("v1"), values("age").sum().as("v2"))
Running Example:
gremlin> g.V().hasLabel("person").group().by(values("name").as("k1"), values("age").as("k2"))
==>{[josh, 32]=[v[4]], [vadas, 27]=[v[2]], [peter, 35]=[v[6]], [marko, 29]=[v[1]]}
gremlin> g.V().hasLabel("person").group().by(out().count().as("k1"), values("name").as("k2"))
==>{[2, josh]=[v[4]], [0, vadas]=[v[2]], [3, marko]=[v[1]], [1, peter]=[v[6]]}
gremlin> g.V().hasLabel("person").group().by("name").by(count().as("v1"), values("age").sum().as("v2"))
==>{marko=[1, 29], peter=[1, 35], josh=[1, 32], vadas=[1, 27]}
gremlin> g.V().hasLabel("person").group().by("name").by(count().as("v1"), values("age").sum().as("v2")).select("v1", "v2")
==>{v1=1, v2=35}
==>{v1=1, v2=32}
==>{v1=1, v2=27}
==>{v1=1, v2=29}
Limitations#
Here we list steps which are unsupported yet. Some will be supported in the near future while others will remain unsupported for some reasons.
To be Supported#
The following steps will be supported in the near future.
elementMap()#
Map the graph element to a map of T.id, T.label and the property values according to the given keys. If no property keys are provided, then all property values are retrieved.
Parameters: propertyKeys - the properties to retrieve.
g.V().elementMap()
path()#
Map the traverser to its path history.
g.V().out().out().path()
g.V().as("a").out().out().select("a").by("name").path()
local()#
g.V().fold().count(local)
g.V().values('age').fold().sum(local)
Will Not be Supported#
The following steps will remain unsupported.
repeat()#
repeat().times() In graph pattern scenarios,
repeat().times()
can be replaced equivalently by thePathExpand
-step.# = g.V().out("2..3", "knows").endV() g.V().repeat(out("knows")).times(2) # = g.V().out("1..3", "knows").endV() g.V().repeat(out("knows")).emit().times(2) # = g.V().out("2..3", "knows").with('PATH_OPT', 'SIMPLE').endV() g.V().repeat(out("knows").simplePath()).times(2) # = g.V().out("1..3", "knows").with('PATH_OPT', 'SIMPLE').endV() g.V().repeat(out("knows").simplePath()).emit().times(2)
repeat().until() It is a imperative syntax, not declarative.
properties()#
The properties()-step retrieves and then unfolds properties from a graph element. The valueMap()-step can reflect all the properties of each graph element in a map form, which could be much more clear than the results of the properties()-step for the latter could mix up the properties of all the graph elements in the same output.
sideEffect#
It is required to maintain global variables for SideEffect
-step during actual execution, which is hard to implement in distributed scenarios. i.e.
group(“a”)
groupCount(“a”)
aggregate(“a”)
sack()
branch#
Currently, we only support the operations of merging multiple streams into one. The following splitting operations are unsupported:
branch()
choose()