Skip to content

Configuration

This guide covers how to configure Embucket for your specific deployment needs. You can configure Embucket using environment variables, configuration files, or command-line options to control storage backends, networking, authentication, and performance settings.

Embucket supports flexible configuration through many methods, allowing you to adapt it for development, testing, and production environments.

You can configure Embucket using three methods, in order of precedence:

  1. Command-line arguments - Highest precedence
  2. Environment variables - Medium precedence
  3. Configuration file .env - Lowest precedence

Create a .env file in your working directory with key-value pairs. Embucket automatically loads this file at startup.

Terminal window
# Basic configuration example
CATALOG_URL=http://127.0.0.1:3000
OBJECT_STORE_BACKEND=memory
JWT_SECRET=your-secret-key

Set environment variables in your shell or deployment environment. Environment variables override configuration file settings.

Terminal window
export OBJECT_STORE_BACKEND=s3
export AWS_REGION=us-east-1

Pass configuration options directly to the embucketd command. Command-line arguments override both environment variables and configuration file settings.

Terminal window
embucketd --backend s3 --port 3001

These settings control Embucket’s basic operation and most deployments need them.

SettingEnvironment VariableDefaultDescription
--hostBUCKET_HOSTlocalhostHost address to bind the API server
--portBUCKET_PORT3000Port for the API server
--assets-portWEB_ASSETS_PORT8080Port for the web UI assets server
SettingEnvironment VariableDefaultDescription
--catalog-urlCATALOG_URLhttp://127.0.0.1:3000URL where external clients can access the Iceberg catalog REST API
SettingEnvironment VariableDefaultDescription
--jwt-secretJWT_SECRETNoneRequired secret key for JWT token generation
--auth-demo-userAUTH_DEMO_USERembucketUsername for demo authentication
--auth-demo-passwordAUTH_DEMO_PASSWORDembucketPassword for demo authentication
SettingEnvironment VariableDefaultDescription
--cors-enabledCORS_ENABLEDtrueEnable Cross-Origin Resource Sharing
--cors-allow-originCORS_ALLOW_ORIGINhttp://localhost:8080Allowed origin for CORS requests

Embucket uses SlateDB to store metadata and supports three storage backends. Choose the backend that matches your deployment environment and performance requirements.

Stores all data in memory. Use this for development, testing, or temporary deployments.

Terminal window
# Configuration
OBJECT_STORE_BACKEND=memory

Use cases:

  • Local development
  • Testing and CI/CD pipelines
  • Temporary data exploration

Stores data in local files. Use this for single-node deployments with persistent storage.

Terminal window
# Configuration
OBJECT_STORE_BACKEND=file
FILE_STORAGE_PATH=./storage
SLATEDB_PREFIX=state

Use cases:

  • Single-node production deployments
  • Development with data persistence
  • Local testing with realistic data sizes

Requirements:

  • Persistent local storage
  • File system write permissions

Stores data in S3 or S3-compatible object storage. Use this for production deployments requiring scalability and durability.

Terminal window
# AWS S3 configuration
OBJECT_STORE_BACKEND=s3
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_REGION=us-east-1
S3_BUCKET=your-embucket-data
Terminal window
# S3-compatible storage (MinIO, etc.)
OBJECT_STORE_BACKEND=s3
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
S3_BUCKET=your-embucket-data
S3_ENDPOINT=https://your-s3-endpoint.com
S3_ALLOW_HTTP=false

Use cases:

  • Production deployments
  • Multi-node or cloud deployments
  • High availability requirements
  • Large-scale data processing

Requirements:

  • S3 bucket with read/write permissions
  • Network connectivity to S3 endpoint
SettingEnvironment VariableDefaultDescription
--data-formatDATA_FORMATjsonData serialization format for Snowflake v1 API: json or arrow
SettingEnvironment VariableDefaultDescription
N/AMEM_POOL_TYPEgreedyMemory pool allocation strategy
N/AMEM_POOL_SIZE_MB4096Memory pool size in megabytes
N/AMEM_ENABLE_TRACK_CONSUMERS_POOLtrueEnable memory pool consumer tracking
N/ADISK_POOL_SIZE_MB102400Disk pool size in megabytes
SettingEnvironment VariableDefaultDescription
--no-bootstrapN/AfalseSkip bootstrap process that creates default database and schema
SettingEnvironment VariableDefaultDescription
--tracing-levelTRACING_LEVELinfoLog level: off, info, debug, or trace
--tracing-span-processorspan_processorbatch-span-processorTracing span processor type

Minimal configuration for local development:

Terminal window
# .env file
OBJECT_STORE_BACKEND=memory
JWT_SECRET=dev-secret-key

Configuration for production deployment on a single server:

Terminal window
# .env file
OBJECT_STORE_BACKEND=file
FILE_STORAGE_PATH=/var/lib/embucket/storage
SLATEDB_PREFIX=prod-state
JWT_SECRET=secure-random-string
BUCKET_HOST=0.0.0.0
CORS_ALLOW_ORIGIN=https://your-domain.com

Configuration for scalable cloud deployment:

Terminal window
# .env file
OBJECT_STORE_BACKEND=s3
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
S3_BUCKET=my-company-embucket-prod
JWT_SECRET=secure-random-string
BUCKET_HOST=0.0.0.0
CORS_ALLOW_ORIGIN=https://analytics.company.com
CATALOG_URL=https://embucket-api.company.com:3000

View all available command-line options:

Terminal window
embucketd --help

Common command patterns:

Terminal window
# Start with custom port and S3 backend
embucketd --port 3001 --backend s3 --bucket my-data-bucket
# Start with file backend and custom storage path
embucketd --backend file --file-storage-path /opt/embucket/data
# Start with debug logging
embucketd --tracing-level debug