Snowplow web analytics

Build a complete web analytics pipeline using Embucket on AWS Lambda with the dbt-embucket adapter. You deploy a Snowplow analytics runtime, run dbt transformations, and inspect derived analytics tables. This tutorial follows the embucket-snowplow repository and runs without a Snowflake account.

What you’ll build

Snowplow provides an open source behavioral data platform that captures granular, event-level web analytics. In this tutorial you connect Snowplow’s dbt packages to Embucket and produce three derived tables:

Page views — aggregated metrics for each page view event.
Sessions — session-level summaries stitched from individual events.
Users — user-level roll-ups across all sessions.

Prerequisites

Before you begin, make sure you have the following:

AWS credentials with permissions for Lambda, CloudFormation, IAM, and S3 Tables
An S3 Table Bucket ARN (see the AWS Lambda deployment guide to create one)
uv or another Python environment manager
Git

Tutorial

Clone the repository

Clone the Snowplow demo repository and change into the project directory:
Terminal window
```
git clone https://github.com/Embucket/embucket-snowplow.git && cd embucket-snowplow
```
Set deploy values

Define a unique stack name and your S3 Table Bucket ARN:
Terminal window
```
STACK_NAME="embucket-demo-$(whoami)-$(date +%s)"
BUCKET_ARN="arn:aws:s3tables:us-east-2:YOUR_ACCOUNT:bucket/YOUR_BUCKET"
```
Replace YOUR_ACCOUNT and YOUR_BUCKET with your actual AWS account ID and bucket name.

Deploy the Lambda stack

Deploy the CloudFormation stack that provisions the Lambda function:

aws cloudformation deploy \
  --template-file deploy/embucket-lambda.cfn.yaml \
  --stack-name "$STACK_NAME" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides S3TableBucketArn="$BUCKET_ARN"

After the stack deploys, capture the Lambda function ARN:

LAMBDA_ARN=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \
  --query 'Stacks[0].Outputs[?OutputKey==`LambdaFunctionArn`].OutputValue' \
  --output text)
echo "$LAMBDA_ARN"

Install dependencies

Install the Python dependencies with uv:
Terminal window
```
uv sync
```
Configure the dbt profile

Copy the example profile and substitute your Lambda ARN:
Terminal window
```
cp profiles.yml.example profiles.yml
sed -i '' "s|YOUR_LAMBDA_ARN_HERE|$LAMBDA_ARN|" profiles.yml
```
Review profiles.yml before running dbt. Confirm that the database, schema, and other connection values match your environment.
Install dbt packages

Pull the Snowplow dbt packages:
Terminal window
```
uv run dbt deps --profiles-dir .
```
Patch packages for compatibility

Run the compatibility patch script:
Terminal window
```
./scripts/patch_snowplow.sh
```
This script patches Snowplow dbt packages that check target.type == 'snowflake'. Skip this step if your packages already support the embucket target type.
Load example data

Load sample Snowplow event data into your S3 Table Bucket:
Terminal window
```
uv run python scripts/load_data.py "$LAMBDA_ARN"
```
Run the pipeline

Seed reference data and run the dbt transformations:
Terminal window
```
uv run dbt seed --profiles-dir .
uv run dbt run --profiles-dir .
```

Verify the results

Query the derived tables to confirm the pipeline completed successfully.

View page views:

uv run dbt show --profiles-dir . --inline \
  "SELECT * FROM demo.atomic_derived.snowplow_web_page_views" --limit 10

View sessions:

uv run dbt show --profiles-dir . --inline \
  "SELECT * FROM demo.atomic_derived.snowplow_web_sessions" --limit 10

View users:

uv run dbt show --profiles-dir . --inline \
  "SELECT * FROM demo.atomic_derived.snowplow_web_users" --limit 10

Cleanup

Delete the CloudFormation stack to remove all deployed resources:

aws cloudformation delete-stack --stack-name "$STACK_NAME"