Skip to content

Configuration

This page covers every configuration option for Embucket, including CLI flags, environment variables, metastore YAML, and deploy-time settings. It doesn’t cover connection setup for clients such as the Snowflake CLI or dbt.

Embucket resolves each setting in the following order, from highest to lowest priority:

  1. CLI flags — passed directly to the binary.
  2. Environment variables — exported in the shell or set in the container.
  3. .env file — loaded once at startup from the working directory.

When the same setting appears at more than one level, the higher-priority source wins.

PurposeFlagEnvironment variableDefault
Metastore config path--metastore-configMETASTORE_CONFIGunset
Bind host--hostBUCKET_HOSTlocalhost
Bind port--portBUCKET_PORT3000
Result serialization--data-formatDATA_FORMATjson
Parser dialect--sql-parser-dialectSQL_PARSER_DIALECTsnowflake
Query concurrency--max-concurrency-levelMAX_CONCURRENCY_LEVEL8
Query timeout--query-timeout-secsQUERY_TIMEOUT_SECS1200
Demo user--auth-demo-userAUTH_DEMO_USERembucket
Demo password--auth-demo-passwordAUTH_DEMO_PASSWORDembucket
JWT signing secret--jwt-secretJWT_SECRETunset
Tracing level--tracing-levelTRACING_LEVELinfo
Service idle timeout--idle-timeout-secondsIDLE_TIMEOUT_SECONDS18000

Embucket supports two ways to configure the metastore: a YAML configuration file or environment variables. The YAML file supports many volumes and full schema/table definitions. Environment variables configure a single volume and work well for simple deployments.

Set METASTORE_CONFIG to the path of a YAML file that defines volumes, databases, schemas, and tables. Embucket reads this file at startup and registers every declared object.

A minimal configuration with no external volumes:

volumes: []

S3 Tables volume:

volumes:
- ident: embucket
type: s3-tables
database: demo
credentials:
credential_type: access_key
aws-access-key-id: ACCESS_KEY
aws-secret-access-key: SECRET_ACCESS_KEY
arn: arn:aws:s3tables:us-east-2:123456789012:bucket/my-table-bucket

External Iceberg tables on S3:

volumes:
- ident: lakehouse
type: s3
region: us-east-2
bucket: YOUR_BUCKET_NAME
credentials:
credential_type: access_key
aws-access-key-id: YOUR_ACCESS_KEY
aws-secret-access-key: YOUR_SECRET_KEY
databases:
- ident: demo
volume: lakehouse
schemas:
- database: demo
schema: tpch_10
tables:
- database: demo
schema: tpch_10
table: customer
metadata_location: s3://YOUR_BUCKET_NAME/tpch_10/customer/metadata/00001.metadata.json

When you enable the state-store-query feature, Embucket persists query state in DynamoDB. See AWS Lambda deployment for how to enable this feature. Configure the table name and connection with the following variables.

Environment variableDefault
STATESTORE_TABLE_NAMEembucket-statestore
STATESTORE_DYNAMODB_ENDPOINTunset
AWS_DDB_ACCESS_KEY_IDunset
AWS_DDB_SECRET_ACCESS_KEYunset
AWS_DDB_SESSION_TOKENunset

These variables control the Lambda packaging and deployment process. Set them before you run the deploy script.

Environment variableDefault
FUNCTION_NAMEembucket-lambda
ENV_FILEconfig/.env.lambda
AWS_LAMBDA_ROLE_ARNunset
WITH_OTEL_CONFIGunset
FEATURESunset
LAYERSunset

Use these variables to adjust memory pools and network timeouts for your workload.

Environment variableDefault
MEM_POOL_TYPEgreedy
MEM_POOL_SIZE_MBunset
DISK_POOL_SIZE_MBunset
AWS_SDK_CONNECT_TIMEOUT_SECS3
AWS_SDK_OPERATION_TIMEOUT_SECS30
OBJECT_STORE_TIMEOUT_SECS10
OBJECT_STORE_CONNECT_TIMEOUT_SECS3

Embucket ships with demo credentials for local development. The default username and password both equal embucket. Override them with AUTH_DEMO_USER and AUTH_DEMO_PASSWORD before you expose the service to any network beyond localhost.