Skip to content

Iceberg

Embucket uses Apache Iceberg’s open table format to store, manage, and query your data at scale. You can use any Iceberg client to read and write data to Embucket tables.

Key points:

  • All Embucket tables use the Apache Iceberg format
  • Embucket exposes an Iceberg Catalog REST API for external tools
  • Compatible with Apache Spark, pyiceberg, and AWS S3 table buckets
  • Embucket doesn’t yet support some Iceberg features

Apache Iceberg provides an open table format that defines how to store and manage large datasets.

The platform uses Iceberg for data storage and exposes an Iceberg Catalog REST API that lets external tools read and write data. All tables use Iceberg format internally, allowing any Iceberg client to access your data. This system targets compatibility with Apache Spark, pyiceberg, and AWS S3 table buckets.

The platform reads data from Apache Spark, pyiceberg, and AWS S3 table buckets. External tools can read data that Embucket writes.

  • Credentials vending
  • Server side planning
  • Views
  • Table maintenance operations such as compaction and snapshot cleanup
  • UPDATE and DELETE operations
  • ALTER TABLE operations
  • MERGE INTO statements
  • Read operations for most Iceberg table formats
  • Write operations that create new data files

The system reads delete files but doesn’t yet support position delete files—specialized files that mark individual rows for deletion. This limitation exists because the underlying Parquet file reader doesn’t yet handle this deletion method.

The platform doesn’t write delete files, which prevents DELETE and UPDATE operations. Still it supports limited MERGE INTO statements that update and delete table rows. These operations use CoW or Copy-On-Write semantics, a strategy that overwrites entire Parquet files—columnar data files—with new data instead of modifying individual rows.

Embucket exposes an Iceberg Catalog REST API that external tools use to read and write data. This API partially implements the Apache Iceberg Catalog REST API.

  • GET /v1/config
  • GET /v1/{wid}/namespaces
  • POST /v1/{wid}/namespaces
  • GET /v1/{wid}/namespaces/{namespace}
  • DELETE /v1/{wid}/namespaces/{namespace}
  • POST /v1/{wid}/namespaces/{namespace}/register
  • GET /v1/{wid}/namespaces/{namespace}/tables
  • POST /v1/{wid}/namespaces/{namespace}/tables
  • GET /v1/{wid}/namespaces/{namespace}/tables/{table}
  • DELETE /v1/{wid}/namespaces/{namespace}/tables/{table}
  • POST /v1/{wid}/namespaces/{namespace}/tables/{table}
  • POST /v1/{wid}/namespaces/{namespace}/tables/{table}/metrics