A production-style MVP for a Reverse ETL platform that syncs customer data from data warehouses (Redshift/DuckDB) to marketing platforms (Braze/Attentive).
To safely support long-term goals (website + product split, customer one-click deployment, config-first architecture), a non-breaking scaffold has been added:
apps/website/*(future website frontend/backend/admin-config)apps/product/*(future product frontend/backend/admin-config)control-plane/*(future tenant lifecycle automation)infra/terraform/*(future reusable IaC modules and tenant stacks)packages/shared-config-schema/*(shared config contracts).env.templates/*(environment templates)
Current production runtime remains on existing folders (frontend/, backend/) until migration cutover.
- Source Connections: Connect to Redshift (production) or DuckDB (local development)
- Destination Connectors: Braze and Attentive with proper payload builders
- Sync Modes: Full refresh and incremental sync with checkpoint tracking
- Field Mapping: UI-driven column mapping from source to destination
- Airflow Integration: DAGs for scheduled sync job orchestration
- Run History: Complete sync run tracking with logs and error details
- Schema Change Detection: Automatic detection of source table changes
- Secure Credentials: Encrypted storage for API keys and passwords
┌─────────────────────────────────────────────────────────────────┐
│ Alo ActivationOS │
├─────────────────────────────────────────────────────────────────┤
│ Frontend (React) │
│ └── Dashboard, Sources, Destinations, Syncs, Runs │
├─────────────────────────────────────────────────────────────────┤
│ Backend (FastAPI) │
│ ├── API Routes (sources, destinations, syncs, runs) │
│ ├── Services (connection, sync, engine) │
│ └── Adapters (sources: DuckDB/Redshift, destinations: Braze/Attentive)│
├─────────────────────────────────────────────────────────────────┤
│ Database (PostgreSQL) │
│ └── Connections, Sync Jobs, Field Mappings, Sync Runs │
├─────────────────────────────────────────────────────────────────┤
│ Scheduler (Airflow) │
│ └── DAGs for sync job orchestration │
└─────────────────────────────────────────────────────────────────┘
- Python 3.9+
- Node.js 18+
- PostgreSQL (optional, SQLite fallback available)
- Apache Airflow (optional, for scheduling)
# Create virtual environment (repo standard: .venv)
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# or: .\\.venv\\Scripts\\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Copy environment file
cp env.example .env
# Edit .env with your settings
# Run backend
cd backend
python main.pyBackend runs at: http://localhost:8000 API docs: http://localhost:8000/api/docs
cd frontend
# Install dependencies
npm install
# Run development server
npm run devFrontend runs at: http://localhost:5173
Prerequisites: Docker with Compose v2 (docker compose).
| Action | Command |
|---|---|
| Start (app + warehouse + Cube + journeys + nginx) | ./start.sh |
| Stop (same compose files) | ./stop.sh |
On first start, if cube/.env is missing it is copied from cube/.env.example.
URLs after ./start.sh
| Service | URL / port |
|---|---|
| Unified UI (nginx: frontend + API + Cube + Journeys on one origin) | http://localhost/ |
| Vite dev server (direct) | http://localhost:5173 |
| FastAPI + OpenAPI | http://localhost:8000 — docs at http://localhost:8000/api/docs |
| Cube (playground / REST) | http://localhost:4001 |
| MinIO console | http://localhost:9001 |
| App metadata Postgres | localhost:5432 |
Warehouse Postgres (gold.* CDP marts) |
localhost:5433 (defaults: db cdp_warehouse, user/password cdp) |
| Redis | 6379 |
| MinIO S3 API | 9000 |
| Journeys stack (Dittofeed deps) | Postgres 25432, ClickHouse 8125, Temporal 27233 |
The backend container uses WAREHOUSE_MODE=postgres and reads from warehouse-postgres. Cube uses the same warehouse. If you have not loaded data, the UI still runs but customer counts and Cube-driven widgets may stay empty until you seed (next section).
Apple Silicon vs x86: cube/docker-compose.cube.yml pins cubejs/cubestore:arm64v8. On Intel Linux or Intel Mac, if Cube Store fails to pull or start, switch that image to cubejs/cubestore:latest (see cube/README.md).
warehouse-postgres is the local analytic database (host port 5433). The backend and Cube expect gold.* tables there when running the Docker stack.
Fresh data vs stale DuckDB: Files under data/demo/*.duckdb are snapshots and fall behind live Redshift gold.*. Whenever you need warehouse-postgres to mirror current Redshift dev, refresh from Redshift first (tunnel up), then seed—do not rely only on old DuckDB files checked into the repo.
Redshift dev tunnel: Point .env at your tunnel (common dev mapping: 127.0.0.1:10005 or localhost:10005 — see env.example REDSHIFT_HOST / REDSHIFT_PORT). The snapshot script checks connectivity before copying.
See also: docs/WAREHOUSE_REDSHIFT_CRM_AIRFLOW.md (Redshift vs API delivery patterns, Airflow).
Path A — Redshift → DuckDB → warehouse-postgres (typical for engineers)
DuckDB here is only a staging format for seed_warehouse_from_duckdb.py; the source of truth for refreshed marts is Redshift each time you run the snapshot.
-
Add Redshift (and tunnel) settings to a repo-root
.env— start fromenv.exampleand follow internal docs for host/port (oftenlocalhost+ tunnel port). -
Ensure the tunnel or network path to Redshift matches your
.env(dev example:127.0.0.1:10005→ Redshift, as inenv.example). -
Snapshot allowlisted
gold.*tables into DuckDB (from repo root, with.venvactivated andpip install -r requirements.txt):python3 scripts/snapshot_redshift_to_duckdb.py --out data/demo/activationos_demo.duckdb --max-rows-per-table 5000
-
The warehouse seeder expects
data/demo/activationos_demo_2.duckdb. After a snapshot, align the name:mkdir -p data/demo cp data/demo/activationos_demo.duckdb data/demo/activationos_demo_2.duckdb
-
Build the small
order_line_factDuckDB used alongside the main demo file (if you do not already havedata/demo/customer_mart.duckdb):python scripts/build_demo_customer_mart_duckdb.py --out data/demo/customer_mart.duckdb
-
Start Docker (
./start.sh) sowarehouse-postgresis up on 5433. -
Seed Postgres using DuckDB’s postgres extension (same shell, repo root):
export WAREHOUSE_POSTGRES_HOST=localhost export WAREHOUSE_POSTGRES_PORT=5433 export WAREHOUSE_POSTGRES_DB=cdp_warehouse export WAREHOUSE_POSTGRES_USER=cdp export WAREHOUSE_POSTGRES_PASSWORD=cdp python cube/scripts/seed_warehouse_from_duckdb.py
-
(Recommended) Apply warehouse indexes for faster slices and extracts:
docker compose -f docker-compose.yml -f cube/docker-compose.cube.yml exec -T warehouse-postgres \ psql -U cdp -d cdp_warehouse < cube/scripts/warehouse_indexes.sql
Use the same
-flist you use to run the stack; container names may include a project prefix (checkdocker compose ... psifexecfails).
Path B — Redshift → warehouse-postgres via Meltano (full ETL parity)
For incremental replication and the same pipeline shape as production, use the Meltano / alo-meltano local-demo style workflow described in cube/README.md (Redshift tap → Postgres warehouse). Cube and the backend then consume warehouse-postgres without the intermediate DuckDB snapshot.
Path C — DuckDB-only app (no warehouse-postgres)
To run the API against a single DuckDB file instead of warehouse Postgres, use WAREHOUSE_MODE=duckdb and set DUCKDB_PATH to your snapshot (see Local demo snapshot (Redshift → DuckDB) under Development below). That path does not populate warehouse-postgres; it is aimed at lightweight or offline API use rather than the full Docker + Cube stack.
Used for scheduled and on-demand triggers of sync jobs (API pattern). Set Variable activationos_api_url (e.g. http://localhost:8000/api/v1). DAGs under airflow/dags/ call the ActivationOS API (e.g. activationos_sync_dag.py). For materializing tables in Redshift for a CRM team (warehouse pattern), run dbt / Meltano / SQL jobs from Airflow as separate DAGs—see docs/WAREHOUSE_REDSHIFT_CRM_AIRFLOW.md.
# Initialize Airflow database
airflow db init
# Create admin user
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com \
--password admin
# Set Airflow variables
airflow variables set activationos_api_url http://localhost:8000/api/v1
# Copy DAGs
cp airflow/dags/*.py $AIRFLOW_HOME/dags/
# Start Airflow
airflow webserver --port 8080 &
airflow scheduler &cd platform/dbt
# Create dedicated virtual environment for dbt
python -m venv .venv
source .venv/bin/activate
# Install dbt dependencies
pip install -r requirements.txt
dbt deps
# Configure profile
cp profiles.yml.example ~/.dbt/profiles.yml
# Validate and run
dbt debug
dbt buildThe dbt project produces marts for:
mart_customer_360mart_segment_basemart_activation_performancemart_ai_customer_insights_teammart_ai_audience_recommendations_team
GET /api/v1/sources- List source connectionsPOST /api/v1/sources- Create source connectionPOST /api/v1/sources/{id}/test- Test source connectionGET /api/v1/sources/{id}/schemas- Get schemasGET /api/v1/sources/{id}/schemas/{schema}/tables- Get tables
GET /api/v1/destinations- List destination connectionsPOST /api/v1/destinations- Create destination connectionPOST /api/v1/destinations/{id}/test- Test destination connection
GET /api/v1/syncs- List sync jobsPOST /api/v1/syncs- Create sync jobPOST /api/v1/syncs/{id}/trigger- Trigger sync runPUT /api/v1/syncs/{id}/mappings- Update field mappingsPOST /api/v1/syncs/{id}/pause- Pause sync jobPOST /api/v1/syncs/{id}/resume- Resume sync job
GET /api/v1/runs- List sync runsGET /api/v1/runs/{run_id}- Get run detailsGET /api/v1/runs/stats/summary- Get run statistics
- Reads all data from source
- Sends complete dataset to destination
- No checkpoint tracking
- Requires
incremental_column(e.g.,updated_at) - Tracks last synced value as checkpoint
- Only syncs records where
incremental_column > checkpoint - Checkpoint only saved on successful completion
{
"attributes": [
{
"external_id": "user_123",
"email": "user@example.com",
"first_name": "John",
"custom_attribute": "value"
}
]
}{
"subscribers": [
{
"phone": "+14155551234",
"email": "user@example.com",
"externalId": "user_123",
"customAttributes": {}
}
]
}Alo ActivationOS/
├── backend/
│ ├── app/
│ │ ├── adapters/
│ │ │ ├── sources/ # DuckDB, Redshift adapters
│ │ │ └── destinations/ # Braze, Attentive adapters
│ │ ├── api/routes/ # FastAPI routes
│ │ ├── core/ # Config, security, logging
│ │ ├── db/ # SQLAlchemy setup
│ │ ├── models/ # SQLAlchemy models
│ │ ├── schemas/ # Pydantic schemas
│ │ └── services/ # Business logic
│ └── main.py
├── frontend/
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── pages/ # Page components
│ │ ├── lib/ # API client, utils
│ │ └── types/ # TypeScript types
│ └── package.json
├── airflow/
│ └── dags/ # Airflow DAGs
├── requirements.txt
└── README.md
| Variable | Description | Default |
|---|---|---|
WAREHOUSE_MODE |
Warehouse mode: redshift (live) or duckdb (demo snapshot) |
redshift |
DUCKDB_PATH |
DuckDB database path | :memory: |
POSTGRES_* |
PostgreSQL connection settings | localhost |
ENCRYPTION_KEY |
Key for encrypting credentials | Required |
MAX_RETRIES |
Max retry attempts for failed syncs | 3 |
SYNC_BATCH_SIZE |
Records per batch | 1000 |
For the Docker stack with warehouse-postgres and Cube, prefer the Preparing warehouse-postgres steps above (snapshot → seed). The following is the DuckDB-only backend mode (WAREHOUSE_MODE=duckdb).
Production/dev is Redshift live. For local demos (or offline-ish exploration), generate a DuckDB snapshot from the same allowlisted marts used by Ask/Reference:
First ensure your Redshift tunnel is running (so localhost:10005 is reachable).
python3 scripts/snapshot_redshift_to_duckdb.py --out data/demo/activationos_demo.duckdb --max-rows-per-table 5000Then run the app with:
export WAREHOUSE_MODE=duckdb
export DUCKDB_PATH=data/demo/activationos_demo.duckdb# Run tests
pytest
# With coverage
pytest --cov=appMIT