Iceberg

Embucket uses Apache Iceberg’s open table format to store, manage, and query your data at scale. You can use any Iceberg client to read and write data to Embucket tables.

Key points:

All Embucket tables use the Apache Iceberg format
Embucket exposes an Iceberg Catalog REST API for external tools
Compatible with Apache Spark, pyiceberg, and AWS S3 table buckets
Embucket doesn’t yet support some Iceberg features

About Apache Iceberg

Apache Iceberg provides an open table format that defines how to store and manage large datasets.

The platform uses Iceberg for data storage and exposes an Iceberg Catalog REST API that lets external tools read and write data. All tables use Iceberg format internally, allowing any Iceberg client to access your data. This system targets compatibility with Apache Spark, pyiceberg, and AWS S3 table buckets.

Compatibility and limitations

The platform reads data from Apache Spark, pyiceberg, and AWS S3 table buckets. External tools can read data that Embucket writes.

Missing features:

Credentials vending
Server side planning
Views
Table maintenance operations such as compaction and snapshot cleanup
UPDATE and DELETE operations
ALTER TABLE operations

Supported operations

MERGE INTO statements
Read operations for most Iceberg table formats
Write operations that create new data files

Read operations

The system reads delete files but doesn’t yet support position delete files—specialized files that mark individual rows for deletion. This limitation exists because the underlying Parquet file reader doesn’t yet handle this deletion method.

Write operations

The platform doesn’t write delete files, which prevents DELETE and UPDATE operations. Still it supports limited MERGE INTO statements that update and delete table rows. These operations use CoW or Copy-On-Write semantics, a strategy that overwrites entire Parquet files—columnar data files—with new data instead of modifying individual rows.

Catalog API

Embucket exposes an Iceberg Catalog REST API that external tools use to read and write data. This API partially implements the Apache Iceberg Catalog REST API.

Supported endpoints:

GET /v1/config
GET /v1/{wid}/namespaces
POST /v1/{wid}/namespaces
GET /v1/{wid}/namespaces/{namespace}
DELETE /v1/{wid}/namespaces/{namespace}
POST /v1/{wid}/namespaces/{namespace}/register
GET /v1/{wid}/namespaces/{namespace}/tables
POST /v1/{wid}/namespaces/{namespace}/tables
GET /v1/{wid}/namespaces/{namespace}/tables/{table}
DELETE /v1/{wid}/namespaces/{namespace}/tables/{table}
POST /v1/{wid}/namespaces/{namespace}/tables/{table}
POST /v1/{wid}/namespaces/{namespace}/tables/{table}/metrics