LOAD FROM
The LOAD FROM clause loads external data sources as temporary tables and executes relational operations directly on them.
This mechanism is designed for lightweight, on-the-fly analysis of external data without importing it into persistent graph storage.
LOAD FROM is particularly suitable for exploratory queries, data validation, and one-time analytical workloads.
Basic Syntax
LOAD FROM "<file_path>" (<options>)
RETURN <column_list>;Parameters
-
<file_path>Specifies the external data source path. Currently, only local file system paths are supported. -
<options>Specifies format-related and performance-related options. Supported options depend on the data format. -
RETURN <column_list>Specifies the columns to return. Use*to return all columns.
LOAD FROM CSV
CSV (Comma-Separated Values) is the most commonly used format for temporary table operations.
CSV Format Options
The following options control how CSV files are parsed:
| Option | Type | Default | Description |
|---|---|---|---|
delim | char | | | Field delimiter. Can be a single character (e.g. ',') or an escape character (e.g. '\t') |
header | bool | true | Whether the first row contains column names |
quote | char | " | Quote character used to enclose field values |
escape | char | \ | Escape character for special characters |
quoting | bool | true | Whether to enable quote processing |
escaping | bool | true | Whether to enable escape character processing |
Query Examples
All the following query examples use the Modern dataset.
Specifying CSV Options
LOAD FROM "person.csv" (
delim = ',',
header = true
)
RETURN name, age;Column Reordering
Columns can be returned in any order, independent of their order in the source file:
LOAD FROM "knows.csv" (delim=',')
RETURN weight, dst_name, src_name;Column Aliases
Use AS to assign aliases to columns:
LOAD FROM "knows.csv" (delim=',')
RETURN src_name AS src, dst_name AS dst, weight AS score;Type Conversion
Use the CAST function to convert column values to a specific type:
LOAD FROM "person.csv" (delim=',')
RETURN name, CAST(age, 'DOUBLE') AS double_age;LOAD FROM JSON
LOAD JSON is supported by extension framework in NeuG, please refer to JSON Extension for more details.
LOAD FROM PARQUET
Relational Operators with LOAD FROM
In addition to RETURN, LOAD FROM can be combined with standard relational operators to express more complex queries.
WHERE Filtering
Filter rows using the WHERE clause:
LOAD FROM "person.csv" (delim=',')
WHERE age > 30
RETURN name, age;Multiple conditions can be combined using AND, OR, and NOT:
LOAD FROM "person.csv" (delim=',')
WHERE age > 25 AND age < 40
RETURN name, age;Aggregate Functions
LOAD FROM supports common aggregate functions.
Row Count
LOAD FROM "person.csv" (delim=',')
RETURN COUNT(*) AS total_count;Aggregation Functions
LOAD FROM "person.csv" (delim=',')
RETURN
SUM(age) AS total_age,
AVG(age) AS avg_age,
MIN(age) AS min_age,
MAX(age) AS max_age;Grouped Aggregation
LOAD FROM "person.csv" (delim=',')
RETURN name, AVG(age) AS avg_age;Sorting and Limiting
Sorting Results
LOAD FROM "person.csv" (delim=',')
RETURN name, age
ORDER BY age DESC, name ASC;Top-K Queries
LOAD FROM "person.csv" (delim=',')
RETURN name, age
ORDER BY age DESC
LIMIT 10;Advanced Features
Performance Options
For large files, the following options can be enabled to improve performance and control memory usage:
| Option | Type | Default | Description |
|---|---|---|---|
batch_read | bool | false | Read data incrementally in batches. If disabled, all data will be loaded into memory at once. |
batch_size | int64 | 1048576(1MB) | Batch size in bytes when batch_read is enabled |
parallel | bool | false | Enable parallel reading using multiple threads, maximum available CPU cores on the machine is utilized by default. |
Example
LOAD FROM "large_person.csv" (
delim = ',',
header = true,
batch_read = true,
batch_size = 2097152, // 2MB
parallel = true
)
RETURN name, age;