Iceberg
Embucket uses Apache Iceberg’s open table format to store, manage, and query your data at scale. You can use any Iceberg client to read and write data to Embucket tables.
Key points:
- All Embucket tables use the Apache Iceberg format
- Embucket exposes an Iceberg Catalog REST API for external tools
- Compatible with Apache Spark,
pyiceberg
, and AWS S3 table buckets - Embucket doesn’t yet support some Iceberg features
About Apache Iceberg
Section titled “About Apache Iceberg”Apache Iceberg provides an open table format that defines how to store and manage large datasets.
The platform uses Iceberg for data storage and exposes an Iceberg Catalog REST API that lets external tools read and write data. All tables use Iceberg format internally, allowing any Iceberg client to access your data. This system targets compatibility with Apache Spark, pyiceberg
, and AWS S3 table buckets.
Compatibility and limitations
Section titled “Compatibility and limitations”The platform reads data from Apache Spark, pyiceberg
, and AWS S3 table buckets. External tools can read data that Embucket writes.
Missing features:
Section titled “Missing features:”- Credentials vending
- Server side planning
- Views
- Table maintenance operations such as compaction and snapshot cleanup
UPDATE
andDELETE
operationsALTER TABLE
operations
Supported operations
Section titled “Supported operations”MERGE INTO
statements- Read operations for most Iceberg table formats
- Write operations that create new data files
Read operations
Section titled “Read operations”The system reads delete files but doesn’t yet support position delete files—specialized files that mark individual rows for deletion. This limitation exists because the underlying Parquet file reader doesn’t yet handle this deletion method.
Write operations
Section titled “Write operations”The platform doesn’t write delete files, which prevents DELETE
and UPDATE
operations. Still it supports limited MERGE INTO
statements that update and delete table rows. These operations use CoW or Copy-On-Write semantics, a strategy that overwrites entire Parquet files—columnar data files—with new data instead of modifying individual rows.
Catalog API
Section titled “Catalog API”Embucket exposes an Iceberg Catalog REST API that external tools use to read and write data. This API partially implements the Apache Iceberg Catalog REST API.
Supported endpoints:
Section titled “Supported endpoints:”GET /v1/config
GET /v1/{wid}/namespaces
POST /v1/{wid}/namespaces
GET /v1/{wid}/namespaces/{namespace}
DELETE /v1/{wid}/namespaces/{namespace}
POST /v1/{wid}/namespaces/{namespace}/register
GET /v1/{wid}/namespaces/{namespace}/tables
POST /v1/{wid}/namespaces/{namespace}/tables
GET /v1/{wid}/namespaces/{namespace}/tables/{table}
DELETE /v1/{wid}/namespaces/{namespace}/tables/{table}
POST /v1/{wid}/namespaces/{namespace}/tables/{table}
POST /v1/{wid}/namespaces/{namespace}/tables/{table}/metrics