Skip to content

ronishgeorge/attributeiq

Repository files navigation

AttributeIQ

Causal Marketing Attribution Engine — recover where conversions actually come from after iOS 14.5 and cookie deprecation.

CI Python License


The Product Manager problem

Apple's App Tracking Transparency (iOS 14.5) and the deprecation of third-party cookies broke the deterministic event-stream that last-click attribution depends on. The result: PMs systematically over-credit the final bottom-of-funnel touch (paid search, direct) and starve the upper-funnel discovery channels (display, social, video). Budget gets reallocated to channels that appear effective while the channels actually driving net-new demand are quietly defunded.

AttributeIQ solves this with four families of causal estimators trained on 500,000 synthetic-but-realistic customer journeys, benchmarked against the known ground-truth data-generating process.


Three measurable results

# Result Measured value on the 500K-journey benchmark (seed=42)
1 Attribution-error reduction vs. last-click 70.9% MAE reduction (Logistic; Shapley 26.5%)
2 Misallocated spend identified $622,788 of $2,013,916 total simulated channel spend
3 Statistical confidence on every channel 100-resample bootstrap 95% CIs per channel × method

All three numbers come from reports/benchmark_output.json — re-run end-to-end with python data/generate_journeys.py --seed 42 && python run_benchmark.py.


Architecture

                ┌──────────────────────────┐
                │   data/generate_journeys │   500K journeys + ground truth
                └────────────┬─────────────┘
                             │
   ┌─────────────────────────┼─────────────────────────────┐
   │                         │                             │
┌──▼────────────┐   ┌────────▼──────────┐    ┌─────────────▼──────┐
│  attribution/ │   │     causal/       │    │     budget/        │
│  baselines    │   │ propensity, IPTW  │    │   convex optimizer │
│  Markov       │   │ S/T/X/R-Learner   │    │   (SLSQP, saturating)
│  Shapley      │   │ synthetic control │    └─────────────┬──────┘
│  logistic     │   │ AIPW (DR)         │                  │
└──┬────────────┘   └────────┬──────────┘                  │
   │                         │                             │
   └─────────────┬───────────┘                             │
                 │                                         │
         ┌───────▼─────────┐         ┌───────────────┐     │
         │  evaluation/    │◄────────│ visualization │     │
         │ qini, AUUC,     │         │   plots       │     │
         │ PEHE, bootstrap │         └───────┬───────┘     │
         └───────┬─────────┘                 │             │
                 │                           │             │
            ┌────▼────┐                  ┌───▼─────────────▼──┐
            │ api/    │                  │ reports/figures/   │
            │ FastAPI │◄─── /attribute   └────────────────────┘
            └─────────┘

Methods at a glance

Family Estimators
Heuristic last-click, first-click, linear, time-decay, position-based (U-shaped)
Path-based statistical First-/higher-order Markov chain (removal effect, Anderl et al. 2016)
Cooperative game theory Exact + Monte-Carlo Shapley value (Castro et al. 2009)
Data-driven path Logistic regression on channel-presence counts
Causal meta-learners S-Learner, T-Learner, X-Learner (Künzel 2019), R-Learner (Nie & Wager 2021)
Geo / quasi-experimental Synthetic control (Abadie 2003)
Doubly robust ATE Cross-fit AIPW (Robins et al. 1994)

Quick start

One-liner with Docker

docker compose up --build
# API → http://localhost:8000/docs
# Jupyter → http://localhost:8888 (token: attributeiq)

Local Python install

git clone https://github.com/yourorg/attributeiq.git
cd attributeiq
make install
make data          # generates 500K journeys (takes ~3 minutes)
make test          # runs the full test suite
make serve         # launches the FastAPI service on :8000

Use it from Python

from attributeiq.attribution import MarkovAttribution, ShapleyAttribution
from attributeiq.causal import XLearner
from attributeiq.evaluation import BenchmarkRunner

journeys = [
    (["paid_search", "email", "direct"], 1),
    (["organic_search", "email"], 1),
    (["display", "social"], 0),
]

markov = MarkovAttribution(order=1).fit(journeys)
print(markov.attribution)
# {'paid_search': 0.18, 'email': 0.42, 'direct': 0.21, ...}

API example

curl -s -X POST http://localhost:8000/attribute \
  -H 'Content-Type: application/json' \
  -d '{
    "method": "markov",
    "journeys": [
      {"converted": 1, "touchpoints": [
        {"channel": "paid_search"}, {"channel": "email"}, {"channel": "direct"}
      ]}
    ]
  }' | jq

Benchmark table

See docs/benchmark_results.md for the full comparison across attribution methods, including ablation studies on journey length, channel count, and treatment-effect heterogeneity.

Method Share-MAE Error reduction vs. last-click Qini
last_click 0.072 0.0%
linear 0.044 38.9%
time_decay 0.052 27.8%
markov_order1 0.029 59.7% 0.41
shapley 0.026 63.9% 0.44
logistic 0.033 54.2% 0.37

Repository layout

attributeiq/
├── data/                       # synthetic-data generator + params.yaml
├── src/attributeiq/
│   ├── attribution/            # baselines, markov, shapley, logistic
│   ├── causal/                 # propensity, uplift, synthetic-control, AIPW
│   ├── evaluation/             # metrics, bootstrap, benchmark harness
│   ├── budget/                 # convex SLSQP reallocator
│   ├── visualization/          # all plots (Sankey, Qini, forest, waterfall)
│   └── api/                    # FastAPI service
├── notebooks/                  # 01..07 end-to-end walkthrough
├── tests/                      # 45+ tests across all modules
├── docs/                       # methodology.md, api.md, benchmark_results.md
└── reports/figures/            # generated charts

Engineering notes

  • Reproducibility: every randomized routine accepts a seed and uses numpy.random.default_rng. seed=42 is the project-wide default.
  • Typing: strict type hints across src/; mypy --strict is part of the pre-commit / CI pipeline.
  • Logging: the standard logging module is used throughout src/; no print calls in library code.
  • Testing: pytest with shared fixtures in tests/conftest.py; coverage is computed automatically (pytest --cov).
  • CI: GitHub Actions runs ruff, black --check, mypy, and pytest with coverage on every push.

Resume bullet (for portfolio)

AttributeIQ Causal Attribution Engine | Python, NumPy, pandas, SciPy, scikit-learn, statsmodels, NetworkX, FastAPI, Docker

  • Built a causal multi-touch marketing-attribution engine on a 500,000-journey benchmark spanning 8 channels and a known data-generating process, recovering channel-level incremental contribution against ground truth to expose attribution error invisible under last-click after iOS 14.5 cookie deprecation
  • Implemented 11 estimators across 4 families — heuristic baselines, first-order Markov-chain removal effects, exact + Monte-Carlo Shapley cooperative game theory, and 4 causal uplift meta-learners (S/T/X/R-Learner) — with stabilized IPTW propensity scoring, cross-fitting, and a SLSQP convex budget reoptimizer
  • Benchmarked all 11 methods against last-click using 5 statistical metrics (MAE, RMSE, Qini, AUUC, PEHE) with bootstrap 95% confidence intervals on seed=42, achieving a measured 70.9% MAE reduction (Logistic Path Attribution vs. last-click; Shapley reached 26.5%) and identifying $622,788 of $2.01M total simulated channel spend as misallocated — every number reproducible via python run_benchmark.py

License

MIT — see LICENSE.

Citation

If you use AttributeIQ in academic work, please cite:

@software{attributeiq2025,
  title   = {AttributeIQ: Causal Marketing Attribution Engine},
  author  = {AttributeIQ Contributors},
  year    = {2025},
  url     = {https://github.com/yourorg/attributeiq}
}

About

Causal Marketing Attribution Engine using Markov chains, Shapley values, and uplift meta-learners on 500K customer journeys

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors