COPY FROM
COPY FROM persists external data into NeuG’s graph storage. It builds on top of LOAD FROM — internally, COPY FROM uses LOAD FROM to read and parse external files, then writes the result into node or relationship tables.
Schema Requirement
Currently, COPY FROM requires a predefined schema — you must create node/relationship tables before importing data. The columns in the external file must match the table properties.
Tip: Coming in v0.2: NeuG will support schema-flexible persistent import — allowing
COPY FROMto leverage the capability of type inference ofLOAD FROM, without requiring a predefined schema. This will make it much easier to quickly onboard new datasets.
Quick Start
Here is a complete example of importing a social network dataset from CSV.
Step 1: Prepare Data Files
users.csv:
id,name,age,email
1,Alice Johnson,30,[email protected]
2,Bob Smith,25,[email protected]
3,Carol Davis,28,[email protected]friendships.csv:
from_user_id,to_user_id,since_year
1,2,2020
2,3,2019
1,3,2021Step 2: Create Schema
CREATE NODE TABLE User(
id INT64 PRIMARY KEY,
name STRING,
age INT64,
email STRING
);
CREATE REL TABLE FRIENDS(
FROM User TO User,
since_year INT64
);Step 3: Import Data
-- Import nodes first (order matters — nodes must exist before relationships)
COPY User FROM "users.csv" (header=true, delimiter=",");
-- Then import relationships
COPY FRIENDS FROM "friendships.csv" (
from="User",
to="User",
header=true,
delimiter=","
);Step 4: Verify
MATCH (u:User) RETURN count(u) AS user_count;
MATCH (u1:User)-[f:FRIENDS]->(u2:User)
RETURN u1.name, u2.name, f.since_year
LIMIT 5;Import into Node Table
Create a node table and import data:
CREATE NODE TABLE person(id INT64, name STRING, age INT64, PRIMARY KEY(id));COPY person FROM "person.csv" (header=true);If data is spread across multiple files, use wildcard characters:
COPY person FROM "person*.csv" (header=true);Note: The number and order of columns in the CSV file must match the properties defined in the node table exactly.
Import into Relationship Table
Create a relationship table and import data:
CREATE REL TABLE knows(FROM person TO person, weight DOUBLE);COPY knows FROM "person_knows_person.csv" (from="person", to="person", header=true);Note: NeuG assumes the first two columns are the primary keys of the
FROMandTOnodes. The remaining columns correspond to relationship properties. Thefromandtoparameters must be specified to identify the endpoint node tables.
CSV Options
The following options control how CSV files are parsed during COPY FROM. These are the same options supported by LOAD FROM:
| Option | Type | Default | Description |
|---|---|---|---|
delim | char | | | Field delimiter |
header | bool | true | Whether the first row contains column names |
quote | char | " | Quote character |
escape | char | \ | Escape character |
quoting | bool | true | Whether to enable quote processing |
escaping | bool | true | Whether to enable escape character processing |
Column Remapping with LOAD FROM
Since COPY FROM builds on LOAD FROM, you can use a LOAD FROM subquery to preprocess data before persisting — including column reordering, renaming, type casting, and filtering.
Reordering Columns
When the file columns are in a different order from the table schema:
person_remap.csv:
age,name,id
39,marko,1
27,vadas,2
32,josh,3
35,peter,4COPY person FROM (
LOAD FROM "person_remap.csv"
RETURN id, name, age
);Remapping Relationship Endpoints
knows_remap.csv:
dst_name,src_name,weight
josh,marko,1.0
vadas,marko,0.5
peter,josh,0.8COPY knows FROM (
LOAD FROM "knows_remap.csv"
RETURN src_name AS src, dst_name AS dst, weight
);Filtering During Import
You can filter rows before persisting, avoiding the need to clean the source file:
COPY person FROM (
LOAD FROM "person.csv" (delim=',')
WHERE age >= 18
RETURN *
);For the full set of relational operations available in LOAD FROM subqueries, see the LOAD FROM reference.
Performance Options
| Option | Type | Default | Description |
|---|---|---|---|
batch_read | bool | false | Read data incrementally in batches. |
batch_size | int64 | 1048576 (1MB) | Batch size in bytes when batch_read is enabled. |
parallel | bool | false | Enable parallel reading using multiple threads (max core number). |
COPY User FROM "large_users.csv" (header=true, parallel=true);Import Order
Always import nodes before relationships, since relationship endpoints must reference existing nodes:
COPY User FROM "users.csv" (header=true);
COPY Company FROM "companies.csv" (header=true);
-- Only after all nodes are loaded:
COPY WORKS_FOR FROM "works_for.csv" (header=true);Large Datasets
For files larger than 1GB, consider splitting them:
# Split large CSV into smaller chunks
split -l 100000 large_users.csv users_chunk_
# Then COPY User FROM each chunked file.
# Note: only use `header=true` for the first chunkTroubleshooting
”Table does not exist” Error
Schema must be created before importing:
-- Wrong: importing before creating schema
COPY User FROM "users.csv" (header=true);
-- Correct: create schema first
CREATE NODE TABLE User(id INT64 PRIMARY KEY, name STRING);
COPY User FROM "users.csv" (header=true);“Column count mismatch” Error
The CSV must have the same number of columns as the table schema:
-- Wrong: missing 'age' column
id,name,email
1,Alice,[email protected]
-- Correct: all columns present
id,name,age,email
1,Alice,30,[email protected]“Primary key violation” Error
Check for duplicate IDs in the source file:
cut -d',' -f1 users.csv | sort | uniq -d