Skip to content

Volumes

Volumes define storage locations for your data in Embucket as database objects. A volume contains configuration settings that reference object storage systems like Amazon S3, along with access credentials and backend specifications. You create databases that use these volumes to store their tables and metadata. This document covers how to configure and manage volumes in your Embucket deployment.

Key topics covered:

  • Volume types and their use cases
  • Managing volumes
  • Database-to-volume relationships and management

Each volume serves as a storage configuration object that contains:

  • Storage location: The object storage bucket or path where Embucket stores data
  • Access credentials: Authentication information for the storage system
  • Storage backend: The storage system type: S3, S3 tables bucket, filesystem, or memory

Volumes serve as the foundation for databases. Every database requires a volume to store its tables and metadata. You can reuse a single volume across many databases.

Embucket supports three volume types:

S3 volumes store data in S3-compatible object storage systems as database objects. S3-compatible systems use the Amazon S3 API standard for data access and management. Use S3 volumes for production deployments.

Supported storage systems:

  • Amazon S3
  • Any S3-compatible storage system

S3 tables volumes store data in AWS S3 table buckets as database objects. Unlike standard S3 volumes that work with any S3-compatible storage, S3 tables volumes specifically integrate with AWS S3 table buckets for optimized performance. Use S3 tables volumes for AWS-native production deployments.

Supported storage systems:

  • AWS S3

Filesystem volumes store data on the local file system as database objects. Unlike cloud-based volumes, filesystem volumes write data directly to disk paths on the server where Embucket runs. Use filesystem volumes only for development and testing.

Memory volumes store data entirely in system memory RAM as database objects. Unlike persistent storage volumes, memory volumes lose all data when the system restarts. Use memory volumes only for temporary testing.

You can create volumes using the Embucket UI, REST API, or SQL interface. T

Terminal window
curl -X POST http://localhost:3000/v1/metastore/volumes \
-H "Content-Type: application/json" \
-d '{
"ident": "production-volume",
"type": "s3",
"bucket": "my-data-bucket",
"endpoint": "https://s3.amazonaws.com",
"credentials": {
"credential_type": "access_key",
"aws-access-key-id": "AKIAIOSFODNN7EXAMPLE",
"aws-secret-access-key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
}'

Volumes require the following attributes:

  • ident: A unique identifier for the volume
  • type: memory for a memory volume

After creating a volume, you can create databases that use that volume to store data.

You can create a database using the REST API, SQL, or UI.

Terminal window
curl -X POST http://localhost:3000/v1/metastore/databases \
-H "Content-Type: application/json" \
-d '{
"ident": "analytics_db",
"volume": "production-volume"
}'

You can create more databases using the same volume. Each database has its own schema and tables.

Terminal window
curl http://localhost:3000/v1/metastore/volumes
Terminal window
curl -X PUT http://localhost:3000/v1/metastore/volumes/production-volume \
-H "Content-Type: application/json" \
-d '{
"credentials": {
"credential_type": "access_key",
"aws-access-key-id": "NEW_ACCESS_KEY",
"aws-secret-access-key": "NEW_SECRET_KEY"
}
}'
Terminal window
curl -X DELETE http://localhost:3000/v1/metastore/volumes/volume-name

Production deployments:

  • Use S3 or S3 table bucket volumes for all production data
  • Configure Identity and Access Management (IAM) policies with least required permissions for S3 bucket access
  • Use separate volumes for different environments: production, staging, development
  • Regularly backup your S3 buckets

Development and testing:

  • Use filesystem or memory volumes for local development
  • Use minio for development environments that need S3 compatibility
  • Avoid using production volumes for testing

Volume organization:

  • Create separate volumes for different data domains or teams
  • Use descriptive volume names that show their purpose
  • Document volume configurations and access policies