Snowplow web analytics
Build a complete web analytics pipeline using Embucket on AWS Lambda with the dbt-embucket adapter. You deploy a Snowplow analytics runtime, run dbt transformations, and inspect derived analytics tables. This tutorial follows the embucket-snowplow repository and runs without a Snowflake account.
What you’ll build
Section titled “What you’ll build”Snowplow provides an open source behavioral data platform that captures granular, event-level web analytics. In this tutorial you connect Snowplow’s dbt packages to Embucket and produce three derived tables:
- Page views — aggregated metrics for each page view event.
- Sessions — session-level summaries stitched from individual events.
- Users — user-level roll-ups across all sessions.
Prerequisites
Section titled “Prerequisites”Before you begin, make sure you have the following:
- AWS credentials with permissions for Lambda, CloudFormation, IAM, and S3 Tables
- An S3 Table Bucket ARN (see the AWS Lambda deployment guide to create one)
uvor another Python environment manager- Git
Tutorial
Section titled “Tutorial”-
Clone the repository
Clone the Snowplow demo repository and change into the project directory:
Terminal window git clone https://github.com/Embucket/embucket-snowplow.git && cd embucket-snowplow -
Set deploy values
Define a unique stack name and your S3 Table Bucket ARN:
Terminal window STACK_NAME="embucket-demo-$(whoami)-$(date +%s)"BUCKET_ARN="arn:aws:s3tables:us-east-2:YOUR_ACCOUNT:bucket/YOUR_BUCKET"Replace
YOUR_ACCOUNTandYOUR_BUCKETwith your actual AWS account ID and bucket name. -
Deploy the Lambda stack
Deploy the CloudFormation stack that provisions the Lambda function:
Terminal window aws cloudformation deploy \--template-file deploy/embucket-lambda.cfn.yaml \--stack-name "$STACK_NAME" \--capabilities CAPABILITY_NAMED_IAM \--parameter-overrides S3TableBucketArn="$BUCKET_ARN"After the stack deploys, capture the Lambda function ARN:
Terminal window LAMBDA_ARN=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \--query 'Stacks[0].Outputs[?OutputKey==`LambdaFunctionArn`].OutputValue' \--output text)echo "$LAMBDA_ARN" -
Install dependencies
Install the Python dependencies with
uv:Terminal window uv sync -
Configure the dbt profile
Copy the example profile and substitute your Lambda ARN:
Terminal window cp profiles.yml.example profiles.ymlsed -i '' "s|YOUR_LAMBDA_ARN_HERE|$LAMBDA_ARN|" profiles.yml -
Install dbt packages
Pull the Snowplow dbt packages:
Terminal window uv run dbt deps --profiles-dir . -
Patch packages for compatibility
Run the compatibility patch script:
Terminal window ./scripts/patch_snowplow.sh -
Load example data
Load sample Snowplow event data into your S3 Table Bucket:
Terminal window uv run python scripts/load_data.py "$LAMBDA_ARN" -
Run the pipeline
Seed reference data and run the dbt transformations:
Terminal window uv run dbt seed --profiles-dir .uv run dbt run --profiles-dir . -
Verify the results
Query the derived tables to confirm the pipeline completed successfully.
View page views:
Terminal window uv run dbt show --profiles-dir . --inline \"SELECT * FROM demo.atomic_derived.snowplow_web_page_views" --limit 10View sessions:
Terminal window uv run dbt show --profiles-dir . --inline \"SELECT * FROM demo.atomic_derived.snowplow_web_sessions" --limit 10View users:
Terminal window uv run dbt show --profiles-dir . --inline \"SELECT * FROM demo.atomic_derived.snowplow_web_users" --limit 10
Cleanup
Section titled “Cleanup”Delete the CloudFormation stack to remove all deployed resources:
aws cloudformation delete-stack --stack-name "$STACK_NAME"