Skip to Content
ExtensionsParquet Extension

Parquet Extension

Apache Parquet is a columnar storage format widely used in data engineering and analytics workloads. NeuG supports Parquet file import functionality through the Extension framework. After loading the Parquet Extension, users can directly load external Parquet files using the LOAD FROM syntax.

Install Extension

INSTALL PARQUET;

Load Extension

LOAD PARQUET;

Using Parquet Extension

LOAD FROM reads Parquet files and exposes their columns for querying. Schema is automatically inferred from the Parquet file metadata by default.

Parquet Format Options

The following options control how Parquet files are read:

OptionTypeDefaultDescription
buffered_streambooltrueEnable buffered I/O stream for improved sequential read performance.
pre_bufferboolfalsePre-buffer column data before decoding. Recommended for high-latency filesystems such as S3.
enable_io_coalescingbooltrueEnable Arrow I/O read coalescing (hole-filling cache) to reduce I/O overhead when reading non-contiguous byte ranges. When true, uses lazy coalescing; when false, uses eager coalescing.
parquet_batch_rowsint6465536Number of rows per Arrow record batch when converting Parquet row groups into in-memory batches.

Query Examples

Basic Parquet Loading

Load all columns from a Parquet file:

LOAD FROM "person.parquet" RETURN *;

Specifying Batch Size

Tune memory usage by adjusting the number of rows read per batch:

LOAD FROM "person.parquet" (parquet_batch_rows=8192) RETURN *;

Enabling I/O Coalescing

Enable eager I/O coalescing for workloads that benefit from pre-fetching contiguous data:

LOAD FROM "person.parquet" (enable_io_coalescing=false) RETURN *;

Column Projection

Return only specific columns from Parquet data:

LOAD FROM "person.parquet" RETURN fName, age;

Column Aliases

Use AS to assign aliases to columns:

LOAD FROM "person.parquet" RETURN fName AS name, age AS years;

Note: All relational operations supported by LOAD FROM — including type conversion, WHERE filtering, aggregation, sorting, and limiting — work the same way with Parquet files. See the LOAD FROM reference for the complete list of operations.