Import Data
NeuG supports importing data into a database from various file formats. Currently, CSV files are fully supported, with other formats such as Parquet and DataFrame =coming soon. This guide shows you how to efficiently load large datasets into your graph database.
Quick Start: Complete Import Workflow
Here’s a complete example of importing a social network dataset:
Step 1: Prepare Your Data Files
users.csv:
id,name,age,email
1,Alice Johnson,30,[email protected]
2,Bob Smith,25,[email protected]
3,Carol Davis,28,[email protected]friendships.csv:
from_user_id,to_user_id,since_year
1,2,2020
2,3,2019
1,3,2021Step 2: Create Schema
-- Create node table for users
CREATE NODE TABLE User(
id INT64 PRIMARY KEY,
name STRING,
age INT64,
email STRING
);
-- Create relationship table for friendships
CREATE REL TABLE FRIENDS(
FROM User TO User,
since_year INT64
);Step 3: Import Data
-- Import users first (order matters!)
COPY User FROM "users.csv" (header=true, delimiter=",");
-- Then import relationships
COPY FRIENDS FROM "friendships.csv" (
from="User",
to="User",
header=true,
delimiter=","
);Step 4: Verify Import
-- Check if data was imported correctly
MATCH (u:User) RETURN count(u) as user_count;
MATCH ()-[f:FRIENDS]->() RETURN count(f) as friendship_count;
-- Sample the data
MATCH (u1:User)-[f:FRIENDS]->(u2:User)
RETURN u1.name, u2.name, f.since_year
LIMIT 5;🎉 Success! You’ve imported a complete social network.
Detailed Reference
The COPY FROM command is the primary way to import data from CSV files into NeuG.
Copy from CSV
Import into Node Table
Create a node table person as follows:
CREATE NODE TABLE person(id INT64, name STRING, age INT64, PRIMARY KEY(id));The CSV file person.csv contains the following columns:
id|name|age
1|marko|29
2|vadas|27
4|josh|32
6|peter|35
...The following statement will import person.csv into person table.
COPY person FROM "person.csv" (header=true);If the data is spread across multiple files in a directory, you can use wildcard characters to specify all files at once. For example
COPY person FROM "person*.csv" (header=true);Note:
- The number of columns in the CSV file must be equal to the number of properties defined for the node.
- The order of columns in the CSV file must align with the order of properties defined for the node. For example, the above node has properties ordered: id, name, age, which correspond to the columns in the CSV file.
Import into Relationship Table
Create a relationship table knows using the following query:
CREATE REL TABLE knows(FROM person TO person, weight DOUBLE);The CSV file person_knows_person.csv contains the following columns:
person.id|person.id|weight
1|2|0.5
1|4|1.0
...The following statement imports the person_knows_person.csv file into the knows table.
COPY knows from "person_knows_person.csv" (from="person", to="person", header=true);Note
- When importing into a relationship table, NeuG assumes the first two columns in the file are the primary keys of the
FROMandTOnodes; The rest of the columns correspond to relationship properties. - The
FROMandTOparameters must be included in the CSV configurations to specify the labels of theFROMandTOnodes.
CSV configurations
The following options control how CSV files are parsed:
| Option | Type | Default | Description |
|---|---|---|---|
delim | char | | | Field delimiter. Can be a single character (e.g. ',') or an escape character (e.g. '\t') |
header | bool | true | Whether the first row contains column names |
quote | char | " | Quote character used to enclose field values |
escape | char | \ | Escape character for special characters |
quoting | bool | true | Whether to enable quote processing |
escaping | bool | true | Whether to enable escape character processing |
Loading with Column Remapping
By combining COPY FROM with LOAD FROM, NeuG allows users to remap columns from external files to target properties during data loading.
This mechanism enables flexible alignment between file schemas and graph schemas without requiring changes to the source files.
Assume that we already have the following two external files: person_remap.csv and knows_remap.csv.
person_remap.csv:
age,name
39,marko
27,vadas
32,josh
35,peterknows_remap.csv:
dst_name,src_name,weight
josh,marko,1.0
vadas,marko,0.5
peter,josh,0.8The following example remaps file columns to node properties:
COPY person FROM (
LOAD FROM "person_remap.csv"
RETURN name, age
);Similarly, column aliases can be used to remap relationship endpoints and properties:
COPY knows FROM (
LOAD FROM "knows_remap.csv"
RETURN src_name AS src, dst_name AS dst, weight
);🛠️ Troubleshooting
Common Import Issues
1. “Table does not exist” Error
-- ❌ Wrong: Importing before creating schema
COPY User FROM "users.csv" (header=true);
-- ✅ Correct: Create schema first
CREATE NODE TABLE User(id INT64 PRIMARY KEY, name STRING);
COPY User FROM "users.csv" (header=true);2. “Column count mismatch” Error
This happens when your CSV has different number of columns than your table schema.
Solution: Make sure CSV columns match table properties exactly:
-- ❌ Wrong: Missing 'age' column
id,name,email
1,Alice,[email protected]
-- ✅ Correct: All columns present
id,name,age,email
1,Alice,30,[email protected]3. “Primary key violation” Error
Solution: Check for duplicate IDs in your CSV file before importing:
# Check for duplicates in CSV (using command line tools)
cut -d',' -f1 users.csv | sort | uniq -d
# Or use a spreadsheet program to identify duplicate IDsAfter importing, verify no duplicates exist:
-- Count records per ID to find duplicates
MATCH (u:User)
RETURN u.id, count(u) as record_count
ORDER BY record_count DESC
LIMIT 10;Performance Tips
1. Import Order Matters
-- ✅ Always import nodes before relationships
COPY User FROM "users.csv" (header=true);
COPY Company FROM "companies.csv" (header=true);
-- Only after all nodes are loaded:
COPY WORKS_FOR FROM "works_for.csv" (header=true);2. Use Parallel Import for Large Files
-- Enable parallel processing for faster imports
COPY User FROM "large_users.csv" (header=true, parallel=true);| Option | Type | Default | Description |
|---|---|---|---|
batch_read | bool | false | Read data incrementally in batches. If disabled, all data will be loaded into memory at once. |
batch_size | int64 | 1048576(1MB) | Batch size in bytes when batch_read is enabled |
parallel | bool | false | Enable parallel reading using multiple threads, maximum available CPU cores on the machine is utilized by default. |
3. Batch Processing for Very Large Datasets
For files larger than 1GB, consider splitting them:
# Split large CSV into smaller chunks
split -l 100000 large_users.csv users_chunk_
# Then COPY User FROM each chunked file.
# Note that only use `header = true` for the first chunkComing Soon: Additional Format Support
NeuG is expanding import capabilities to support more data formats:
- Parquet Files: High-performance columnar format, ideal for analytics workloads
- DataFrame Integration: Direct import from pandas DataFrames and other data science tools
- and more …
These features will provide even more flexibility for data scientists and analysts working with diverse data sources.
What’s Next?
After successfully importing your data:
- Query your data - Learn Cypher query language
- Export data - Export data when needed