This repository contains the dbt (data build tool) project for Bandsintown's data analytics platform, integrated with AWS EMR Serverless and orchestrated via Apache Airflow.
Project: dbt Data Platform (DI-11)
Epic: Infrastructure Setup (DI-12)
Owner: Complicated Subsystem Team / Data Platform Team
This service transforms raw data from EMR ingestion pipelines into analytics-ready datasets using dbt Core, with models materialized in AWS Athena.
EMR Ingestion β S3 Raw Data β Athena (bandsintown_raw)
β
dbt Transformations
(EMR Serverless)
β
Athena Analytics Schema
(staging β intermediate β marts)
- Python 3.9+
- AWS Account with appropriate IAM permissions
- Access to Bandsintown AWS resources:
- S3:
s3://bandsintown-dbt-analytics/ - Athena Workgroup:
bandsintown-dbt-{env} - EMR Serverless Application
- S3:
- Airflow environment (for production deployments)
# Clone the repository
git clone git@github.com:bandsintown/bit-dbt.git
cd bit-dbt
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env with your configuration
# Set AWS credentials, region, S3 paths, Athena workgroup, etc.Required environment variables:
AWS_REGION- AWS region (e.g., us-east-1)DBT_ATHENA_S3_STAGING_DIR- S3 path for Athena query resultsDBT_ATHENA_S3_DATA_DIR- S3 path for dbt table dataDBT_ATHENA_DATABASE- Athena database nameDBT_ATHENA_WORKGROUP- Athena workgroup name (EMR Serverless enabled)DBT_TARGET- Target environment (dev/staging/prod)
# Set profiles directory
export DBT_PROFILES_DIR=$(pwd)
# Test connection to Athena
dbt debug
# Expected output: "Connection test: OK"# Install dbt packages (if any)
dbt deps
# Run all models
dbt run
# Run specific models
dbt run --select stg_events
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serve # View docs at http://localhost:8080bit-dbt/
βββ models/
β βββ staging/ # Staging models (views)
β β βββ bandsintown_raw/
β β βββ src_bandsintown_raw.yml
β β βββ stg_events.sql
β β βββ stg_bandsintown_raw.yml
β βββ intermediate/ # Intermediate business logic (views)
β βββ marts/ # Final analytics tables
βββ macros/ # Custom dbt macros
βββ tests/ # Custom data tests
βββ seeds/ # CSV reference data
βββ snapshots/ # SCD Type 2 snapshots
βββ analyses/ # Ad-hoc SQL queries
βββ airflow/
β βββ dags/
β βββ bandsintown_dbt_dag.py
βββ dbt_project.yml # dbt project configuration
βββ profiles.yml # dbt connection profiles
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variable template
βββ .gitignore
βββ README.md
- Staging (
models/staging/): Views - Fast, lightweight transformations - Intermediate (
models/intermediate/): Views - Business logic, reusable - Marts (
models/marts/): Tables - Final consumption layer
bandsintown_raw β Source data (read-only)
βββ events
bandsintown_analytics_{env}
βββ staging β stg_events, stg_artists, etc.
βββ intermediate β int_* models
βββ analytics β dim_*, fct_* final tables
The EMR Serverless execution role requires:
Athena Permissions:
athena:StartQueryExecutionathena:GetQueryExecutionathena:GetQueryResultsathena:StopQueryExecution
S3 Permissions:
- Read:
s3://bandsintown-raw-data/* - Read/Write:
s3://bandsintown-dbt-analytics/*
Glue Permissions:
glue:GetDatabaseglue:GetTableglue:GetPartitionsglue:CreateTableglue:UpdateTableglue:DeleteTable
See iam-policy-template.json for full policy.
Run dbt transformations directly from the command line:
Basic Workflow:
# Install dependencies
dbt deps
# Test connection
dbt debug
# Check source data freshness
dbt source freshness
# Run transformations
dbt run
# Run data quality tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serveSchedule with Cron (Optional):
# Add to crontab for daily runs at 6 AM
0 6 * * * cd /path/to/bit-dbt && source .venv/bin/activate && dbt run && dbt testdbt testdbt source freshnessdbt test --select stg_eventsdbt tests ensure:
- Primary keys are unique and not null
- Foreign key relationships are valid
- Accepted values match expected enums
- Source data freshness (< 24 hours)
- Custom business logic validations
export DBT_TARGET=dev
dbt runexport DBT_TARGET=staging
dbt run --full-refreshDeployed via Airflow DAG automatically after EMR ingestion completes.
Deploy the IAM permissions stack with:
make deploy-permissions STAGE=prod AWS_PROFILE=default AWS_REGION=us-east-1There is also a GitHub Actions pipeline at .github/workflows/deploy-serverless-permissions.yml.
It deploys automatically on changes to the IAM Serverless config and can be run manually via workflow dispatch.
Buildkite upload_s3 now uploads:
scripts/tos3://bit-dbt-<env>/dags/dependencies/dbt/scripts/- dbt project payload to
s3://bit-dbt-<env>/dags/dependencies/dbt/project/
In Airflow/MWAA, use:
/usr/local/airflow/dags/dependencies/dbt/scripts/run_dbt.sh run
/usr/local/airflow/dags/dependencies/dbt/scripts/run_dbt.sh testThe helper script accepts additional dbt args, for example:
/usr/local/airflow/dags/dependencies/dbt/scripts/run_dbt.sh build --select tag:dailyGenerate and view dbt documentation:
dbt docs generate
dbt docs serveDocumentation artifacts are automatically uploaded to S3 after each production run:
s3://bandsintown-dbt-analytics/docs/manifest.jsons3://bandsintown-dbt-analytics/docs/catalog.json
# Check AWS credentials
aws sts get-caller-identity
# Verify S3 access
aws s3 ls s3://bandsintown-dbt-analytics/
# Test Athena workgroup
aws athena get-work-group --work-group bandsintown-dbt-prod# Clear cache and retry
dbt clean
dbt deps
dbt run
# Verbose logging
dbt run --debug
# Run single model with full refresh
dbt run --select stg_events --full-refresh- Create a feature branch from
main - Make changes and test locally
- Submit PR with description and tests
- Require 2 approvals from data platform team
- Merge to
maintriggers deployment to staging - Manual promotion to production
Team: Data Platform / Complicated Subsystem Team
Slack: #data-platform
Email: data-platform@bandsintown.com
- dbt Documentation
- dbt-athena-community
- Bandsintown Engineering Handbook
- EMR Serverless Documentation
Proprietary - Bandsintown, Inc.
Last Updated: May 14, 2026
Version: 1.0.0