Skip to content

Configuration

This guide covers how to configure Embucket for your specific deployment needs. You can configure Embucket using environment variables, configuration files, or command-line options to control storage backends, networking, authentication, and performance settings.

Embucket supports flexible configuration through many methods, allowing you to adapt it for development, testing, and production environments.

You can configure Embucket using three methods, in order of precedence:

  1. Command-line arguments - Highest precedence
  2. Environment variables - Medium precedence
  3. Configuration file .env - Lowest precedence

Create a .env file in your working directory with key-value pairs. Embucket automatically loads this file at startup.

Terminal window
# Basic configuration example
METASTORE_CONFIG=config/metastore.yaml
JWT_SECRET=your-secret-key

Set environment variables in your shell or deployment environment. Environment variables override configuration file settings.

Terminal window
export METASTORE_CONFIG=config/metastore.yaml
export JWT_SECRET=your-secret-key

Pass configuration options directly to the embucketd command. Command-line arguments override both environment variables and configuration file settings.

Terminal window
embucketd --metastore-config config/metastore.yaml --port 3001

These settings control Embucket’s basic operation and most deployments need them.

SettingEnvironment VariableDefaultDescription
--hostBUCKET_HOSTlocalhostHost address to bind the API server
--portBUCKET_PORT3000Port for the API server
SettingEnvironment VariableDefaultDescription
--jwt-secretJWT_SECRETNoneRequired secret key for JWT token generation
--auth-demo-userAUTH_DEMO_USERembucketUsername for demo authentication
--auth-demo-passwordAUTH_DEMO_PASSWORDembucketPassword for demo authentication

The Embucket metastore uses a YAML file that defines volumes, databases, schemas, and tables. Specify the path with the --metastore-config command-line argument or the METASTORE_CONFIG environment variable.

In Embucket, a volume defines the storage location for data. Configure either an S3 volume or an S3 tables volume.

Use an S3 volume to point at Iceberg tables already stored on S3—for example, Snowflake Open Catalog Managed tables.

volumes:
- ident: tpch
type: s3
region: us-east-2
bucket: embucket-lakehouse
credentials:
credential_type: access_key
aws-access-key-id: <your-aws-access-key-id>
aws-secret-access-key: <your-aws-secret-access-key>
databases:
- ident: demo
volume: tpch
schemas:
- database: demo
schema: tpch_10
tables:
- database: demo
schema: tpch_10
table: customer
metadata_location: s3://<your-bucket-name>/metadata/00001-eea1cccb-38a4-4fe2-8c95-c01dae9d0c60.metadata.json
- database: demo
schema: tpch_10
table: lineitem
metadata_location: s3://<your-bucket-name>/metadata/00001-d777220e-d508-4033-a229-8c4c8d8fe514.metadata.json

In this example:

  • Define a single tpch volume with type s3
  • Create one database, demo, that uses the tpch volume
  • Create the tpch_10 schema inside the demo database
  • Create two tables, customer and lineitem, inside the tpch_10 schema

When loaded, Embucket creates a database demo with schema tpch_10 and two tables customer and lineitem. It uses the tpch volume information in the configuration file to access data.

Use S3 tables volume for data managed by AWS S3 Table buckets.

volumes:
- ident: tpch
type: s3-tables
database: demo
credentials:
credential_type: access_key
aws-access-key-id: <your-aws-access-key-id>
aws-secret-access-key: <your-aws-secret-access-key>
arn: arn:aws:s3tables:us-east-2:767397688925:bucket/my-testing-minimal

In this example:

  • Define a single tpch volume of type s3-tables
  • Each s3-tables volume maps to one database; here, demo

When loaded, Embucket creates a database demo and lists all namespaces and tables that exist in the provided S3 Table bucket.

SettingEnvironment VariableDefaultDescription
--data-formatDATA_FORMATjsonData serialization format for Snowflake v1 API: json or arrow
SettingEnvironment VariableDefaultDescription
N/AMEM_POOL_TYPEgreedyMemory pool allocation strategy
N/AMEM_POOL_SIZE_MB4096Memory pool size in megabytes
N/AMEM_ENABLE_TRACK_CONSUMERS_POOLtrueEnable memory pool consumer tracking
N/ADISK_POOL_SIZE_MB102400Disk pool size in megabytes

Embucket doesn’t create default volumes, databases, or schemas. Seed the metastore with a configuration file or create the necessary entries before querying.

SettingEnvironment VariableDefaultDescription
--tracing-levelTRACING_LEVELinfoLog level: off, info, debug, or trace
--tracing-span-processorspan_processorbatch-span-processorTracing span processor type

View all available command-line options:

Terminal window
embucketd --help

Common command patterns:

Terminal window
# Start with custom port and S3 backend
embucketd --port 3001 --backend s3 --bucket my-data-bucket
# Start with file backend and custom storage path
embucketd --backend file --file-storage-path /opt/embucket/data
# Start with debug logging
embucketd --tracing-level debug