Skip to Content
Cypher ManualLOAD FROM

LOAD FROM

The LOAD FROM clause loads external data sources as temporary tables and executes relational operations directly on them. This mechanism is designed for lightweight, on-the-fly analysis of external data without importing it into persistent graph storage.

LOAD FROM is particularly suitable for exploratory queries, data validation, and one-time analytical workloads.

Basic Syntax

LOAD FROM "<file_path>" (<options>) RETURN <column_list>;

Parameters

  • <file_path> Specifies the external data source path. Currently, only local file system paths are supported.

  • <options> Specifies format-related and performance-related options. Supported options depend on the data format.

  • RETURN <column_list> Specifies the columns to return. Use * to return all columns.

LOAD FROM CSV

CSV (Comma-Separated Values) is the most commonly used format for temporary table operations.

CSV Format Options

The following options control how CSV files are parsed:

OptionTypeDefaultDescription
delimchar|Field delimiter. Can be a single character (e.g. ',') or an escape character (e.g. '\t')
headerbooltrueWhether the first row contains column names
quotechar"Quote character used to enclose field values
escapechar\Escape character for special characters
quotingbooltrueWhether to enable quote processing
escapingbooltrueWhether to enable escape character processing

Query Examples

All the following query examples use the Modern dataset.

Specifying CSV Options

LOAD FROM "person.csv" ( delim = ',', header = true ) RETURN name, age;

Column Reordering

Columns can be returned in any order, independent of their order in the source file:

LOAD FROM "knows.csv" (delim=',') RETURN weight, dst_name, src_name;

Column Aliases

Use AS to assign aliases to columns:

LOAD FROM "knows.csv" (delim=',') RETURN src_name AS src, dst_name AS dst, weight AS score;

Type Conversion

Use the CAST function to convert column values to a specific type:

LOAD FROM "person.csv" (delim=',') RETURN name, CAST(age, 'DOUBLE') AS double_age;

LOAD FROM JSON

LOAD JSON is supported by extension framework in NeuG, please refer to JSON Extension for more details.

LOAD FROM PARQUET

Relational Operators with LOAD FROM

In addition to RETURN, LOAD FROM can be combined with standard relational operators to express more complex queries.

WHERE Filtering

Filter rows using the WHERE clause:

LOAD FROM "person.csv" (delim=',') WHERE age > 30 RETURN name, age;

Multiple conditions can be combined using AND, OR, and NOT:

LOAD FROM "person.csv" (delim=',') WHERE age > 25 AND age < 40 RETURN name, age;

Aggregate Functions

LOAD FROM supports common aggregate functions.

Row Count

LOAD FROM "person.csv" (delim=',') RETURN COUNT(*) AS total_count;

Aggregation Functions

LOAD FROM "person.csv" (delim=',') RETURN SUM(age) AS total_age, AVG(age) AS avg_age, MIN(age) AS min_age, MAX(age) AS max_age;

Grouped Aggregation

LOAD FROM "person.csv" (delim=',') RETURN name, AVG(age) AS avg_age;

Sorting and Limiting

Sorting Results

LOAD FROM "person.csv" (delim=',') RETURN name, age ORDER BY age DESC, name ASC;

Top-K Queries

LOAD FROM "person.csv" (delim=',') RETURN name, age ORDER BY age DESC LIMIT 10;

Advanced Features

Performance Options

For large files, the following options can be enabled to improve performance and control memory usage:

OptionTypeDefaultDescription
batch_readboolfalseRead data incrementally in batches. If disabled, all data will be loaded into memory at once.
batch_sizeint641048576(1MB)Batch size in bytes when batch_read is enabled
parallelboolfalseEnable parallel reading using multiple threads, maximum available CPU cores on the machine is utilized by default.

Example

LOAD FROM "large_person.csv" ( delim = ',', header = true, batch_read = true, batch_size = 2097152, // 2MB parallel = true ) RETURN name, age;