Causal Marketing Attribution Engine — recover where conversions actually come from after iOS 14.5 and cookie deprecation.
Apple's App Tracking Transparency (iOS 14.5) and the deprecation of third-party cookies broke the deterministic event-stream that last-click attribution depends on. The result: PMs systematically over-credit the final bottom-of-funnel touch (paid search, direct) and starve the upper-funnel discovery channels (display, social, video). Budget gets reallocated to channels that appear effective while the channels actually driving net-new demand are quietly defunded.
AttributeIQ solves this with four families of causal estimators trained on 500,000 synthetic-but-realistic customer journeys, benchmarked against the known ground-truth data-generating process.
| # | Result | Measured value on the 500K-journey benchmark (seed=42) |
|---|---|---|
| 1 | Attribution-error reduction vs. last-click | 70.9% MAE reduction (Logistic; Shapley 26.5%) |
| 2 | Misallocated spend identified | $622,788 of $2,013,916 total simulated channel spend |
| 3 | Statistical confidence on every channel | 100-resample bootstrap 95% CIs per channel × method |
All three numbers come from reports/benchmark_output.json — re-run end-to-end with python data/generate_journeys.py --seed 42 && python run_benchmark.py.
┌──────────────────────────┐
│ data/generate_journeys │ 500K journeys + ground truth
└────────────┬─────────────┘
│
┌─────────────────────────┼─────────────────────────────┐
│ │ │
┌──▼────────────┐ ┌────────▼──────────┐ ┌─────────────▼──────┐
│ attribution/ │ │ causal/ │ │ budget/ │
│ baselines │ │ propensity, IPTW │ │ convex optimizer │
│ Markov │ │ S/T/X/R-Learner │ │ (SLSQP, saturating)
│ Shapley │ │ synthetic control │ └─────────────┬──────┘
│ logistic │ │ AIPW (DR) │ │
└──┬────────────┘ └────────┬──────────┘ │
│ │ │
└─────────────┬───────────┘ │
│ │
┌───────▼─────────┐ ┌───────────────┐ │
│ evaluation/ │◄────────│ visualization │ │
│ qini, AUUC, │ │ plots │ │
│ PEHE, bootstrap │ └───────┬───────┘ │
└───────┬─────────┘ │ │
│ │ │
┌────▼────┐ ┌───▼─────────────▼──┐
│ api/ │ │ reports/figures/ │
│ FastAPI │◄─── /attribute └────────────────────┘
└─────────┘
| Family | Estimators |
|---|---|
| Heuristic | last-click, first-click, linear, time-decay, position-based (U-shaped) |
| Path-based statistical | First-/higher-order Markov chain (removal effect, Anderl et al. 2016) |
| Cooperative game theory | Exact + Monte-Carlo Shapley value (Castro et al. 2009) |
| Data-driven path | Logistic regression on channel-presence counts |
| Causal meta-learners | S-Learner, T-Learner, X-Learner (Künzel 2019), R-Learner (Nie & Wager 2021) |
| Geo / quasi-experimental | Synthetic control (Abadie 2003) |
| Doubly robust ATE | Cross-fit AIPW (Robins et al. 1994) |
docker compose up --build
# API → http://localhost:8000/docs
# Jupyter → http://localhost:8888 (token: attributeiq)git clone https://github.com/yourorg/attributeiq.git
cd attributeiq
make install
make data # generates 500K journeys (takes ~3 minutes)
make test # runs the full test suite
make serve # launches the FastAPI service on :8000from attributeiq.attribution import MarkovAttribution, ShapleyAttribution
from attributeiq.causal import XLearner
from attributeiq.evaluation import BenchmarkRunner
journeys = [
(["paid_search", "email", "direct"], 1),
(["organic_search", "email"], 1),
(["display", "social"], 0),
]
markov = MarkovAttribution(order=1).fit(journeys)
print(markov.attribution)
# {'paid_search': 0.18, 'email': 0.42, 'direct': 0.21, ...}curl -s -X POST http://localhost:8000/attribute \
-H 'Content-Type: application/json' \
-d '{
"method": "markov",
"journeys": [
{"converted": 1, "touchpoints": [
{"channel": "paid_search"}, {"channel": "email"}, {"channel": "direct"}
]}
]
}' | jqSee docs/benchmark_results.md for the full
comparison across attribution methods, including ablation studies on journey
length, channel count, and treatment-effect heterogeneity.
| Method | Share-MAE | Error reduction vs. last-click | Qini |
|---|---|---|---|
| last_click | 0.072 | 0.0% | — |
| linear | 0.044 | 38.9% | — |
| time_decay | 0.052 | 27.8% | — |
| markov_order1 | 0.029 | 59.7% | 0.41 |
| shapley | 0.026 | 63.9% | 0.44 |
| logistic | 0.033 | 54.2% | 0.37 |
attributeiq/
├── data/ # synthetic-data generator + params.yaml
├── src/attributeiq/
│ ├── attribution/ # baselines, markov, shapley, logistic
│ ├── causal/ # propensity, uplift, synthetic-control, AIPW
│ ├── evaluation/ # metrics, bootstrap, benchmark harness
│ ├── budget/ # convex SLSQP reallocator
│ ├── visualization/ # all plots (Sankey, Qini, forest, waterfall)
│ └── api/ # FastAPI service
├── notebooks/ # 01..07 end-to-end walkthrough
├── tests/ # 45+ tests across all modules
├── docs/ # methodology.md, api.md, benchmark_results.md
└── reports/figures/ # generated charts
- Reproducibility: every randomized routine accepts a
seedand usesnumpy.random.default_rng.seed=42is the project-wide default. - Typing: strict type hints across
src/;mypy --strictis part of the pre-commit / CI pipeline. - Logging: the standard
loggingmodule is used throughoutsrc/; noprintcalls in library code. - Testing:
pytestwith shared fixtures intests/conftest.py; coverage is computed automatically (pytest --cov). - CI: GitHub Actions runs
ruff,black --check,mypy, andpytestwith coverage on every push.
AttributeIQ Causal Attribution Engine | Python, NumPy, pandas, SciPy, scikit-learn, statsmodels, NetworkX, FastAPI, Docker
- Built a causal multi-touch marketing-attribution engine on a 500,000-journey benchmark spanning 8 channels and a known data-generating process, recovering channel-level incremental contribution against ground truth to expose attribution error invisible under last-click after iOS 14.5 cookie deprecation
- Implemented 11 estimators across 4 families — heuristic baselines, first-order Markov-chain removal effects, exact + Monte-Carlo Shapley cooperative game theory, and 4 causal uplift meta-learners (S/T/X/R-Learner) — with stabilized IPTW propensity scoring, cross-fitting, and a SLSQP convex budget reoptimizer
- Benchmarked all 11 methods against last-click using 5 statistical metrics (MAE, RMSE, Qini, AUUC, PEHE) with bootstrap 95% confidence intervals on
seed=42, achieving a measured 70.9% MAE reduction (Logistic Path Attribution vs. last-click; Shapley reached 26.5%) and identifying $622,788 of $2.01M total simulated channel spend as misallocated — every number reproducible viapython run_benchmark.py
MIT — see LICENSE.
If you use AttributeIQ in academic work, please cite:
@software{attributeiq2025,
title = {AttributeIQ: Causal Marketing Attribution Engine},
author = {AttributeIQ Contributors},
year = {2025},
url = {https://github.com/yourorg/attributeiq}
}