Configuration
This guide covers how to configure Embucket for your specific deployment needs. You can configure Embucket using environment variables, configuration files, or command-line options to control storage backends, networking, authentication, and performance settings.
Embucket supports flexible configuration through many methods, allowing you to adapt it for development, testing, and production environments.
Configuration methods
Section titled “Configuration methods”You can configure Embucket using three methods, in order of precedence:
- Command-line arguments - Highest precedence
- Environment variables - Medium precedence
- Configuration file
.env- Lowest precedence
Configuration file
Section titled “Configuration file”Create a .env file in your working directory with key-value pairs. Embucket automatically loads this file at startup.
# Basic configuration exampleMETASTORE_CONFIG=config/metastore.yamlJWT_SECRET=your-secret-keyEnvironment variables
Section titled “Environment variables”Set environment variables in your shell or deployment environment. Environment variables override configuration file settings.
export METASTORE_CONFIG=config/metastore.yamlexport JWT_SECRET=your-secret-keyCommand-line arguments
Section titled “Command-line arguments”Pass configuration options directly to the embucketd command. Command-line arguments override both environment variables and configuration file settings.
embucketd --metastore-config config/metastore.yaml --port 3001Core settings
Section titled “Core settings”These settings control Embucket’s basic operation and most deployments need them.
Network configuration
Section titled “Network configuration”| Setting | Environment Variable | Default | Description |
|---|---|---|---|
--host | BUCKET_HOST | localhost | Host address to bind the API server |
--port | BUCKET_PORT | 3000 | Port for the API server |
Authentication
Section titled “Authentication”| Setting | Environment Variable | Default | Description |
|---|---|---|---|
--jwt-secret | JWT_SECRET | None | Required secret key for JWT token generation |
--auth-demo-user | AUTH_DEMO_USER | embucket | Username for demo authentication |
--auth-demo-password | AUTH_DEMO_PASSWORD | embucket | Password for demo authentication |
Metastore configuration
Section titled “Metastore configuration”The Embucket metastore uses a YAML file that defines volumes, databases, schemas, and tables.
Specify the path with the --metastore-config command-line argument or the METASTORE_CONFIG environment variable.
In Embucket, a volume defines the storage location for data. Configure either an S3 volume or an S3 tables volume.
S3 volume
Section titled “S3 volume”Use an S3 volume to point at Iceberg tables already stored on S3—for example, Snowflake Open Catalog Managed tables.
volumes: - ident: tpch type: s3 region: us-east-2 bucket: embucket-lakehouse credentials: credential_type: access_key aws-access-key-id: <your-aws-access-key-id> aws-secret-access-key: <your-aws-secret-access-key>
databases: - ident: demo volume: tpch
schemas: - database: demo schema: tpch_10
tables: - database: demo schema: tpch_10 table: customer metadata_location: s3://<your-bucket-name>/metadata/00001-eea1cccb-38a4-4fe2-8c95-c01dae9d0c60.metadata.json
- database: demo schema: tpch_10 table: lineitem metadata_location: s3://<your-bucket-name>/metadata/00001-d777220e-d508-4033-a229-8c4c8d8fe514.metadata.jsonIn this example:
- Define a single
tpchvolume with types3 - Create one database,
demo, that uses thetpchvolume - Create the
tpch_10schema inside thedemodatabase - Create two tables,
customerandlineitem, inside thetpch_10schema
When loaded, Embucket creates a database demo with schema tpch_10 and two tables customer and lineitem. It uses the tpch volume information in the configuration file to access data.
S3 tables volume
Section titled “S3 tables volume”Use S3 tables volume for data managed by AWS S3 Table buckets.
volumes: - ident: tpch type: s3-tables database: demo credentials: credential_type: access_key aws-access-key-id: <your-aws-access-key-id> aws-secret-access-key: <your-aws-secret-access-key> arn: arn:aws:s3tables:us-east-2:767397688925:bucket/my-testing-minimalIn this example:
- Define a single
tpchvolume of types3-tables - Each
s3-tablesvolume maps to one database; here,demo
When loaded, Embucket creates a database demo and lists all namespaces and tables that exist in the provided S3 Table bucket.
Advanced settings
Section titled “Advanced settings”Data format
Section titled “Data format”| Setting | Environment Variable | Default | Description |
|---|---|---|---|
--data-format | DATA_FORMAT | json | Data serialization format for Snowflake v1 API: json or arrow |
Memory and disk pools
Section titled “Memory and disk pools”| Setting | Environment Variable | Default | Description |
|---|---|---|---|
| N/A | MEM_POOL_TYPE | greedy | Memory pool allocation strategy |
| N/A | MEM_POOL_SIZE_MB | 4096 | Memory pool size in megabytes |
| N/A | MEM_ENABLE_TRACK_CONSUMERS_POOL | true | Enable memory pool consumer tracking |
| N/A | DISK_POOL_SIZE_MB | 102400 | Disk pool size in megabytes |
Embucket doesn’t create default volumes, databases, or schemas. Seed the metastore with a configuration file or create the necessary entries before querying.
Tracing and debugging
Section titled “Tracing and debugging”| Setting | Environment Variable | Default | Description |
|---|---|---|---|
--tracing-level | TRACING_LEVEL | info | Log level: off, info, debug, or trace |
--tracing-span-processor | span_processor | batch-span-processor | Tracing span processor type |
Command-line reference
Section titled “Command-line reference”View all available command-line options:
embucketd --helpCommon command patterns:
# Start with custom port and S3 backendembucketd --port 3001 --backend s3 --bucket my-data-bucket
# Start with file backend and custom storage pathembucketd --backend file --file-storage-path /opt/embucket/data
# Start with debug loggingembucketd --tracing-level debug