Configuration
This guide covers how to configure Embucket for your specific deployment needs. You can configure Embucket using environment variables, configuration files, or command-line options to control storage backends, networking, authentication, and performance settings.
Embucket supports flexible configuration through many methods, allowing you to adapt it for development, testing, and production environments.
Configuration methods
Section titled “Configuration methods”You can configure Embucket using three methods, in order of precedence:
- Command-line arguments - Highest precedence
- Environment variables - Medium precedence
- Configuration file
.env
- Lowest precedence
Configuration file
Section titled “Configuration file”Create a .env
file in your working directory with key-value pairs. Embucket automatically loads this file at startup.
# Basic configuration exampleCATALOG_URL=http://127.0.0.1:3000OBJECT_STORE_BACKEND=memoryJWT_SECRET=your-secret-key
Environment variables
Section titled “Environment variables”Set environment variables in your shell or deployment environment. Environment variables override configuration file settings.
export OBJECT_STORE_BACKEND=s3export AWS_REGION=us-east-1
Command-line arguments
Section titled “Command-line arguments”Pass configuration options directly to the embucketd
command. Command-line arguments override both environment variables and configuration file settings.
embucketd --backend s3 --port 3001
Core settings
Section titled “Core settings”These settings control Embucket’s basic operation and most deployments need them.
Network configuration
Section titled “Network configuration”Setting | Environment Variable | Default | Description |
---|---|---|---|
--host | BUCKET_HOST | localhost | Host address to bind the API server |
--port | BUCKET_PORT | 3000 | Port for the API server |
--assets-port | WEB_ASSETS_PORT | 8080 | Port for the web UI assets server |
Catalog configuration
Section titled “Catalog configuration”Setting | Environment Variable | Default | Description |
---|---|---|---|
--catalog-url | CATALOG_URL | http://127.0.0.1:3000 | URL where external clients can access the Iceberg catalog REST API |
Authentication
Section titled “Authentication”Setting | Environment Variable | Default | Description |
---|---|---|---|
--jwt-secret | JWT_SECRET | None | Required secret key for JWT token generation |
--auth-demo-user | AUTH_DEMO_USER | embucket | Username for demo authentication |
--auth-demo-password | AUTH_DEMO_PASSWORD | embucket | Password for demo authentication |
CORS configuration
Section titled “CORS configuration”Setting | Environment Variable | Default | Description |
---|---|---|---|
--cors-enabled | CORS_ENABLED | true | Enable Cross-Origin Resource Sharing |
--cors-allow-origin | CORS_ALLOW_ORIGIN | http://localhost:8080 | Allowed origin for CORS requests |
Storage backends
Section titled “Storage backends”Embucket uses SlateDB to store metadata and supports three storage backends. Choose the backend that matches your deployment environment and performance requirements.
Memory backend
Section titled “Memory backend”Stores all data in memory. Use this for development, testing, or temporary deployments.
# ConfigurationOBJECT_STORE_BACKEND=memory
Use cases:
- Local development
- Testing and CI/CD pipelines
- Temporary data exploration
File system backend
Section titled “File system backend”Stores data in local files. Use this for single-node deployments with persistent storage.
# ConfigurationOBJECT_STORE_BACKEND=fileFILE_STORAGE_PATH=./storageSLATEDB_PREFIX=state
Use cases:
- Single-node production deployments
- Development with data persistence
- Local testing with realistic data sizes
Requirements:
- Persistent local storage
- File system write permissions
S3-compatible object storage
Section titled “S3-compatible object storage”Stores data in S3 or S3-compatible object storage. Use this for production deployments requiring scalability and durability.
# AWS S3 configurationOBJECT_STORE_BACKEND=s3AWS_ACCESS_KEY_ID=your-access-keyAWS_SECRET_ACCESS_KEY=your-secret-keyAWS_REGION=us-east-1S3_BUCKET=your-embucket-data
# S3-compatible storage (MinIO, etc.)OBJECT_STORE_BACKEND=s3AWS_ACCESS_KEY_ID=your-access-keyAWS_SECRET_ACCESS_KEY=your-secret-keyS3_BUCKET=your-embucket-dataS3_ENDPOINT=https://your-s3-endpoint.comS3_ALLOW_HTTP=false
Use cases:
- Production deployments
- Multi-node or cloud deployments
- High availability requirements
- Large-scale data processing
Requirements:
- S3 bucket with read/write permissions
- Network connectivity to S3 endpoint
Advanced settings
Section titled “Advanced settings”Data format
Section titled “Data format”Setting | Environment Variable | Default | Description |
---|---|---|---|
--data-format | DATA_FORMAT | json | Data serialization format for Snowflake v1 API: json or arrow |
Memory and disk pools
Section titled “Memory and disk pools”Setting | Environment Variable | Default | Description |
---|---|---|---|
N/A | MEM_POOL_TYPE | greedy | Memory pool allocation strategy |
N/A | MEM_POOL_SIZE_MB | 4096 | Memory pool size in megabytes |
N/A | MEM_ENABLE_TRACK_CONSUMERS_POOL | true | Enable memory pool consumer tracking |
N/A | DISK_POOL_SIZE_MB | 102400 | Disk pool size in megabytes |
Bootstrapping
Section titled “Bootstrapping”Setting | Environment Variable | Default | Description |
---|---|---|---|
--no-bootstrap | N/A | false | Skip bootstrap process that creates default database and schema |
Tracing and debugging
Section titled “Tracing and debugging”Setting | Environment Variable | Default | Description |
---|---|---|---|
--tracing-level | TRACING_LEVEL | info | Log level: off , info , debug , or trace |
--tracing-span-processor | span_processor | batch-span-processor | Tracing span processor type |
Configuration examples
Section titled “Configuration examples”Development setup
Section titled “Development setup”Minimal configuration for local development:
# .env fileOBJECT_STORE_BACKEND=memoryJWT_SECRET=dev-secret-key
Single-node production
Section titled “Single-node production”Configuration for production deployment on a single server:
# .env fileOBJECT_STORE_BACKEND=fileFILE_STORAGE_PATH=/var/lib/embucket/storageSLATEDB_PREFIX=prod-stateJWT_SECRET=secure-random-stringBUCKET_HOST=0.0.0.0CORS_ALLOW_ORIGIN=https://your-domain.com
Deployment with S3
Section titled “Deployment with S3”Configuration for scalable cloud deployment:
# .env fileOBJECT_STORE_BACKEND=s3AWS_ACCESS_KEY_ID=AKIA...AWS_SECRET_ACCESS_KEY=...AWS_REGION=us-east-1S3_BUCKET=my-company-embucket-prodJWT_SECRET=secure-random-stringBUCKET_HOST=0.0.0.0CORS_ALLOW_ORIGIN=https://analytics.company.comCATALOG_URL=https://embucket-api.company.com:3000
Command-line reference
Section titled “Command-line reference”View all available command-line options:
embucketd --help
Common command patterns:
# Start with custom port and S3 backendembucketd --port 3001 --backend s3 --bucket my-data-bucket
# Start with file backend and custom storage pathembucketd --backend file --file-storage-path /opt/embucket/data
# Start with debug loggingembucketd --tracing-level debug