Skip to Content
Data I/OLoad Data

Load Data

LOAD FROM is the foundation of NeuG’s data ingestion pipeline. It reads external files, automatically infers the schema (column names and types), and produces a temporary result set that exists only during query execution. No upfront schema definition is needed.

You can apply standard relational operations — projection, filtering, type casting, aggregation, sorting — directly on the loaded data. This makes LOAD FROM ideal for data exploration, validation, and ad-hoc analysis.

Basic Syntax

LOAD FROM "<file_path>" (<options>) [WHERE <condition>] RETURN <column_list> [ORDER BY <column> [ASC|DESC]] [LIMIT <n>];

Parameters

  • <file_path> — Path to the external data source. Currently only local file system paths are supported.
  • <options> — Format-specific and performance-related options (see below).
  • RETURN <column_list> — Columns to return. Use * to return all columns.

Format Options

CSV

CSV is the built-in format. The following options control how CSV files are parsed:

OptionTypeDefaultDescription
delimchar|Field delimiter. Can be a single character (e.g. ',') or an escape character (e.g. '\t')
headerbooltrueWhether the first row contains column names
quotechar"Quote character used to enclose field values
escapechar\Escape character for special characters
quotingbooltrueWhether to enable quote processing
escapingbooltrueWhether to enable escape character processing

Example:

LOAD FROM "person.csv" (delim=',', header=true) RETURN name, age;

JSON / JSONL

JSON support is provided via the JSON Extension. After installing and loading the extension, LOAD FROM can read .json and .jsonl files:

INSTALL json; LOAD EXTENSION json; LOAD FROM "person.json" RETURN *;

See the JSON Extension page for format-specific options and examples.

Parquet

Parquet support is planned for v0.2.

Relational Operations

LOAD FROM supports a rich set of relational operations on the loaded data. All the following examples use the Modern dataset.

Column Projection and Reordering

Columns can be returned in any order, independent of their order in the source file:

LOAD FROM "knows.csv" (delim=',') RETURN weight, dst_name, src_name;

Column Aliases

Use AS to assign aliases to columns:

LOAD FROM "knows.csv" (delim=',') RETURN src_name AS src, dst_name AS dst, weight AS score;

Distinct Values

Use RETURN DISTINCT to remove duplicate rows from the result:

LOAD FROM "person.csv" (delim=',') RETURN DISTINCT name;

You can also use DISTINCT with multiple columns:

LOAD FROM "person.csv" (delim=',') RETURN DISTINCT name, age;

Type Casting

Use the CAST function to convert column values to a specific type:

LOAD FROM "person.csv" (delim=',') RETURN name, CAST(age, 'DOUBLE') AS double_age;

WHERE Filtering

Filter rows using the WHERE clause. Multiple conditions can be combined using AND, OR, and NOT:

LOAD FROM "person.csv" (delim=',') WHERE age > 25 AND age < 40 RETURN name, age;

Aggregation

LOAD FROM supports common aggregate functions (COUNT, SUM, AVG, MIN, MAX) and grouped aggregation:

LOAD FROM "person.csv" (delim=',') RETURN COUNT(*) AS total, AVG(age) AS avg_age, MIN(age) AS min_age, MAX(age) AS max_age;
LOAD FROM "person.csv" (delim=',') RETURN name, AVG(age) AS avg_age;

Sorting and Limiting

LOAD FROM "person.csv" (delim=',') RETURN name, age ORDER BY age DESC, name ASC LIMIT 10;

Performance Options

For large files, the following option can improve read performance:

OptionTypeDefaultDescription
parallelboolfalseEnable parallel reading using multiple threads (max core number).

Note: Batch reading options (batch_read, batch_size) are currently supported in COPY FROM, not in LOAD FROM.

Example:

LOAD FROM "large_person.csv" ( delim = ',', header = true, parallel = true ) RETURN name, age;