cms-flaf · kandrosov · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026
diff --git a/.github/workflows/deploy-docs.yaml b/.github/workflows/deploy-docs.yaml
@@ -0,0 +1,48 @@
+name: Deploy documentation
+
+on:
+  push:
+    branches: [ main ]
+    paths:
+      - "docs/**"
+      - "mkdocs.yml"
+      - ".github/workflows/deploy-docs.yaml"
+  pull_request:
+    paths:
+      - "docs/**"
+      - "mkdocs.yml"
+      - ".github/workflows/deploy-docs.yaml"
+  workflow_dispatch:
+
+permissions:
+  contents: write
+
+jobs:
+  validate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.x"
+      - name: Install MkDocs Material
+        run: pip install mkdocs-material
+      - name: Build site (strict)
+        run: mkdocs build --strict
+
+  deploy:
+    needs: validate
+    if: github.event_name != 'pull_request'
+    runs-on: ubuntu-latest
+    concurrency:
+      group: deploy-docs
+      cancel-in-progress: false
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.x"
+      - name: Install MkDocs Material
+        run: pip install mkdocs-material
+      - name: Deploy to GitHub Pages
+        run: mkdocs gh-deploy --force
diff --git a/README.md b/README.md
@@ -1,8 +1,13 @@
-# HH -> bbtautau Framework
+# FLAF
 
-FLAF - Flexible LAW-based Analysis Framework.
-Task workflow managed is done via [LAW](https://github.com/riga/law) (Luigi Analysis Framework).
+**FLAF** — the Flexible LAW-based Analysis Framework — is the shared CMS analysis framework used by
+the HH→bb̄ττ, HH→bb̄WW and H→μμ analyses. Task workflows are managed with
+[LAW](https://github.com/riga/law) (the Luigi Analysis Workflow).
 
-Documentation is available on [GitHub Pages](https://cms-flaf.github.io/FLAF/).
+📖 **Documentation: <https://cms-flaf.github.io/FLAF/>**
+
+FLAF is included as a git submodule inside each analysis repository — you do not clone it on its own
+to run an analysis. Start with the
+[installation guide](https://cms-flaf.github.io/FLAF/getting-started/installation/).
 
 
diff --git a/docs/analyses.md b/docs/analyses.md
@@ -0,0 +1,41 @@
+# Analyses
+
+FLAF is shared by three analyses. The **common** pipeline is documented here; each analysis adds
+its own physics — extra submodules, observables, signals and (for the HH analyses) statistical
+inference — documented in that analysis's own `docs/`.
+
+| Analysis | Channel | Adds on top of FLAF | Docs |
+|---|---|---|---|
+| **HH→bb̄ττ** | HH → bb̄ττ | SVfit (`ClassicSVfit`, `SVfitTF`), `HHKinFit2`, `HHbtag`, DeepTau; resonant + non-resonant signals; `StatInference`. | [github.com/cms-flaf/HH_bbtautau](https://github.com/cms-flaf/HH_bbtautau) → `docs/` |
+| **HH→bb̄WW** | HH → bb̄WW | `DeepHME` mass reconstruction; b-tag-shape caching (`AnalysisCacheTask`); `StatInference`. | [github.com/cms-flaf/HH_bbWW](https://github.com/cms-flaf/HH_bbWW) → `docs/` |
+| **H→μμ** | H → μμ | Single-Higgs; the simplest setup (just `FLAF` + `Corrections`); **no** statistical-inference submodule. | [github.com/cms-flaf/H_mumu](https://github.com/cms-flaf/H_mumu) → `docs/` |
+
+## What is common vs analysis-specific
+
+- **Common (here, in FLAF):** the [task graph](concepts/data-flow.md), the
+  [configuration system](concepts/configuration.md), the [environment](concepts/environment.md),
+  [storage](concepts/storage.md), [eras](concepts/eras.md) and [CI](ci/integration-pipeline.md).
+  The [full-workflow walkthrough](workflow/walkthrough.md) applies to every analysis.
+- **Analysis-specific (in each repo's `docs/`):** the extra physics submodules and how to set them
+  up, the analysis's signals and processes, its observables and any analysis-only steps, and — for
+  HH→bb̄ττ and HH→bb̄WW — the statistical-inference configuration.
+
+## HH→bb̄ττ — the reference analysis
+
+The most feature-complete analysis: SVfit and HHKinFit2 mass reconstruction, the HHbtag b-jet
+identifier, DeepTau-based τ identification (select the version with
+`--customisations deepTauVersion=2p5`), and resonant + non-resonant signal models. Used throughout
+these docs as the worked example.
+
+## HH→bb̄WW
+
+Uses `DeepHME` for mass reconstruction instead of SVfit. Its pipeline inserts a b-tag-shape caching
+step (`AnalysisCacheTask`/`AnalysisCacheAggregationTask`) before histogramming — see the caveat in
+the [walkthrough](workflow/walkthrough.md#stage-2-compute-analysis-observables-histtuples) and
+[Task reference](reference/tasks.md#analysiscachetask).
+
+## H→μμ
+
+A single-Higgs analysis with the leanest submodule set (no `StatInference`/`inference`). Its CI
+runs over **all** Run 3 eras (`H_mumu_eras: ALL`), and its CI process names are lower-case
+(`custom_CI_signal`, …) — see [Processes & models](configuration/processes-and-models.md).
diff --git a/docs/analysis.md b/docs/analysis.md
diff --git a/docs/ci/github-actions.md b/docs/ci/github-actions.md
@@ -0,0 +1,67 @@
+# GitHub Actions
+
+FLAF uses **two** continuous-integration systems:
+
+| System | Where | Purpose |
+|---|---|---|
+| **GitHub Actions** | GitHub | Fast code-quality and sanity checks on every pull request. |
+| **FLAF integration** | GitLab CI (CERN) | The full pipeline run that checks physics correctness. Triggered by a bot comment — see [Integration pipeline](integration-pipeline.md). |
+
+This page covers the GitHub Actions checks.
+
+## Shared, reusable workflows
+
+The analysis repositories don't duplicate CI logic. Each workflow is a thin wrapper that calls the
+shared implementation in FLAF:
+
+```yaml
+jobs:
+  my-job:
+    uses: cms-flaf/FLAF/.github/workflows/<workflow>.yaml@main
+    secrets: inherit
+```
+
+So fixing a check in FLAF fixes it everywhere. (A checkout helper inside the shared workflows makes
+the FLAF tooling — `.yamllint`, `.clang-format` — available even though FLAF is a submodule.)
+
+## The standard checks
+
+| Workflow | Runs on | What it checks |
+|---|---|---|
+| `formatting-check.yaml` | PRs | Code style: **flake8**/black (Python), **clang-format** (C++), **yamllint** (YAML). |
+| `repo-sanity-checks.yaml` | PRs | Submodule-pointer consistency, repository health, no stray binary files. |
+| `test-setup-loading.yaml` | PRs | Actually loads `Setup.py` for **every configured era** — catches config typos and broken references early (a real run, not a dry run). |
+| `trigger-flaf-integration.yaml` | PR comments | Parses a `@cms-flaf-bot` comment and triggers the GitLab pipeline. See [Integration pipeline](integration-pipeline.md). |
+
+FLAF itself additionally runs:
+
+| Workflow | What it checks |
+|---|---|
+| `cross-section-check.yaml` | Cross-section values are consistent/valid. |
+| `ds-consistency-check.yaml` | `datasets.yaml` entries are well-formed (generator, resolvable cross-section, naming) via `test/checkDatasetConfigConsistency.py`. |
+
+## Passing the checks before you push
+
+Formatting is enforced, so format **before** committing. The convenience script applies all
+formatters at once (with `flaf_env` active):
+
+```sh
+bash run_tools/apply_format.sh
+```
+
+Or run them individually:
+
+```sh
+black <file.py>                                   # Python
+clang-format -i --style "file:.clang-format" <f>  # C++
+yamllint -s -c .yamllint <file.yaml>              # YAML
+```
+
+If you edited `datasets.yaml`, also run the consistency check from
+[Datasets](../configuration/datasets.md#validate-the-dataset-config). See
+[Contributing](../contributing.md) for the full pre-PR checklist.
+
+!!! note "Required secrets"
+    The bot-trigger workflow needs the org-level secrets `FLAF_INTEGRATION_TOKEN` (GitLab trigger)
+    and `FLAF_GITHUB_TOKEN` (to post the reply comment), inherited via `secrets: inherit`. The
+    quality checks need no secrets.
diff --git a/docs/ci/integration-pipeline.md b/docs/ci/integration-pipeline.md
@@ -0,0 +1,103 @@
+# Integration pipeline
+
+The **FLAF integration pipeline** runs the actual analysis pipeline end-to-end (on tiny test
+inputs) to check that a change produces correct results — not just that it is well formatted. It
+runs on **GitLab CI at CERN** (project
+[`cms-flaf/flaf_integration`](https://gitlab.cern.ch/cms-flaf/flaf_integration), project id
+`210600`) and is triggered from GitHub by a bot comment.
+
+## Triggering it: `@cms-flaf-bot please test`
+
+On a pull request (in a repo that supports it), an authorised user posts a comment:
+
+```text
+@cms-flaf-bot please test
+```
+
+The `trigger-flaf-integration.yaml` workflow then:
+
+1. checks the commenter is in `authorized_users` and the header is recognised;
+2. reads `.github/integration_cfg.yaml` **from the PR's branch**;
+3. substitutes the PR's own version (so the pipeline tests *this* PR);
+4. triggers the GitLab pipeline and posts back a `[pipeline#…] started` comment (or a 👎 reaction if
+   it could not start).
+
+Repos with the trigger enabled: HH_bbtautau, HH_bbWW, H_mumu, FLAF, Corrections, StatInference.
+
+!!! tip "Test a change that spans repositories"
+    Add lines to point a dependency at your PR or branch, e.g.:
+    ```text
+    @cms-flaf-bot please test
+    - https://github.com/cms-flaf/FLAF/pull/272
+    ```
+    Shorthands include `- <repo>_version=PR_<n>`, a `…/pull/<n>` URL, a `…/tree/<branch>` URL, and
+    `- gitlab_branch=<branch>` to run a non-default `flaf_integration` branch.
+
+## `integration_cfg.yaml`
+
+Each participating repo has `.github/integration_cfg.yaml`. It lists who may trigger, the accepted
+comment headers, and the **variables** passed to the pipeline:
+
+```yaml
+variables:
+  HH_bbtautau_version: "main"
+  FLAF_version: "default"          # "default" = keep flaf_integration's current value
+  Corrections_version: "default"
+  HH_bbtautau_active: "1"          # "1" = run this analysis, "0" = skip
+  HH_bbtautau_task: "FLAF.Analysis.tasks.HistPlotTask"
+  HH_bbtautau_args: "--branches 0 --test 1000"
+  HH_bbtautau_eras: "Run3_2022 Run3_2022EE Run3_2023 Run3_2023BPix"
+  HH_bbtautau_processes: "custom_CI_Signal custom_CI_Background custom_CI_Data"
+  TEST_TIMEOUT: "4h"
+```
+
+| Variable | Meaning |
+|---|---|
+| `<ana>_active` | Whether to run that analysis (`1`/`0`). |
+| `<ana>_version` / `<pkg>_version` | Which version of a repo to use; `default` keeps the pipeline's current value. |
+| `<ana>_task` | The target task (the pipeline runs everything up to it). |
+| `<ana>_args` | Extra `law run` arguments (e.g. `--branches 0 --test 1000`). |
+| `<ana>_eras` | Eras to test (space-separated, or `ALL`). |
+| `<ana>_processes` | The processes to test (space-separated). **Required** for an active analysis — there is no default. |
+
+!!! warning "`<ana>_processes` must be set for an active analysis"
+    The pipeline **errors at generation time** if an active analysis has no `processes`. The values
+    live in each repo's `integration_cfg.yaml` (capitalised for HH analyses, lower-case for H→μμ —
+    see [Processes & models](../configuration/processes-and-models.md)). They are declared but left
+    empty in `flaf_integration/.gitlab-ci.yml`, so the trigger accepts them while the real values
+    come from the triggering repo.
+
+### Root packages vs packages
+
+The shared trigger logic distinguishes:
+
+- **root packages** — repos with an `_active` variable (the analyses: HH_bbtautau, HH_bbWW,
+  H_mumu);
+- **packages** — repos with a `_version` but no `_active` (FLAF, Corrections, StatInference).
+
+Both may trigger the pipeline; the distinction matters only when editing the trigger logic.
+
+## What the pipeline does
+
+```mermaid
+flowchart LR
+    P[Parent pipeline<br/>.gitlab-ci.yml] -->|generate_child_pipeline.py| C[Child pipeline]
+    C --> B[build: per analysis]
+    B --> T1[test_dataset:<br/>per process]
+    T1 --> T2[test_era / test_multi_era]
+    T2 --> N[notify GitHub]
+```
+
+- The **parent** pipeline runs `scripts/generate_child_pipeline.py`, which expands the active
+  analyses × eras × processes into concrete jobs (pure Python, no PyYAML on the runner).
+- The **child** pipeline builds each active analysis once, then runs the requested task per
+  process/era on tiny inputs (`--test`), and finally notifies GitHub of success/failure.
+- Disabled analyses/eras are simply not emitted; jobs are non-interruptible so parallel pipelines
+  on the same branch don't cancel each other.
+
+## Reproducing CI locally
+
+You can run what a CI job runs without the bot — point `fs_default` at a local path, use
+`phys_model: TestModel` and `--test 1000`, and launch the target task with `--workflow local`. See
+[Your first run](../getting-started/first-run.md) and the
+[`user_custom.yaml` guide](../configuration/user-custom.md).