Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/deploy-docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Deploy documentation

on:
push:
branches: [ main ]
paths:
- "docs/**"
- "mkdocs.yml"
- ".github/workflows/deploy-docs.yaml"
pull_request:
paths:
- "docs/**"
- "mkdocs.yml"
- ".github/workflows/deploy-docs.yaml"
workflow_dispatch:

permissions:
contents: write

jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.x"
- name: Install MkDocs Material
run: pip install mkdocs-material
- name: Build site (strict)
run: mkdocs build --strict

deploy:
needs: validate
if: github.event_name != 'pull_request'
runs-on: ubuntu-latest
concurrency:
group: deploy-docs
cancel-in-progress: false
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.x"
- name: Install MkDocs Material
run: pip install mkdocs-material
- name: Deploy to GitHub Pages
run: mkdocs gh-deploy --force
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# HH -> bbtautau Framework
# FLAF

FLAF - Flexible LAW-based Analysis Framework.
Task workflow managed is done via [LAW](https://github.com/riga/law) (Luigi Analysis Framework).
**FLAF** — the Flexible LAW-based Analysis Framework — is the shared CMS analysis framework used by
the HH→bb̄ττ, HH→bb̄WW and H→μμ analyses. Task workflows are managed with
[LAW](https://github.com/riga/law) (the Luigi Analysis Workflow).

Documentation is available on [GitHub Pages](https://cms-flaf.github.io/FLAF/).
📖 **Documentation: <https://cms-flaf.github.io/FLAF/>**

FLAF is included as a git submodule inside each analysis repository — you do not clone it on its own
to run an analysis. Start with the
[installation guide](https://cms-flaf.github.io/FLAF/getting-started/installation/).


41 changes: 41 additions & 0 deletions docs/analyses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Analyses

FLAF is shared by three analyses. The **common** pipeline is documented here; each analysis adds
its own physics — extra submodules, observables, signals and (for the HH analyses) statistical
inference — documented in that analysis's own `docs/`.

| Analysis | Channel | Adds on top of FLAF | Docs |
|---|---|---|---|
| **HH→bb̄ττ** | HH → bb̄ττ | SVfit (`ClassicSVfit`, `SVfitTF`), `HHKinFit2`, `HHbtag`, DeepTau; resonant + non-resonant signals; `StatInference`. | [github.com/cms-flaf/HH_bbtautau](https://github.com/cms-flaf/HH_bbtautau) → `docs/` |
| **HH→bb̄WW** | HH → bb̄WW | `DeepHME` mass reconstruction; b-tag-shape caching (`AnalysisCacheTask`); `StatInference`. | [github.com/cms-flaf/HH_bbWW](https://github.com/cms-flaf/HH_bbWW) → `docs/` |
| **H→μμ** | H → μμ | Single-Higgs; the simplest setup (just `FLAF` + `Corrections`); **no** statistical-inference submodule. | [github.com/cms-flaf/H_mumu](https://github.com/cms-flaf/H_mumu) → `docs/` |

## What is common vs analysis-specific

- **Common (here, in FLAF):** the [task graph](concepts/data-flow.md), the
[configuration system](concepts/configuration.md), the [environment](concepts/environment.md),
[storage](concepts/storage.md), [eras](concepts/eras.md) and [CI](ci/integration-pipeline.md).
The [full-workflow walkthrough](workflow/walkthrough.md) applies to every analysis.
- **Analysis-specific (in each repo's `docs/`):** the extra physics submodules and how to set them
up, the analysis's signals and processes, its observables and any analysis-only steps, and — for
HH→bb̄ττ and HH→bb̄WW — the statistical-inference configuration.

## HH→bb̄ττ — the reference analysis

The most feature-complete analysis: SVfit and HHKinFit2 mass reconstruction, the HHbtag b-jet
identifier, DeepTau-based τ identification (select the version with
`--customisations deepTauVersion=2p5`), and resonant + non-resonant signal models. Used throughout
these docs as the worked example.

## HH→bb̄WW

Uses `DeepHME` for mass reconstruction instead of SVfit. Its pipeline inserts a b-tag-shape caching
step (`AnalysisCacheTask`/`AnalysisCacheAggregationTask`) before histogramming — see the caveat in
the [walkthrough](workflow/walkthrough.md#stage-2-compute-analysis-observables-histtuples) and
[Task reference](reference/tasks.md#analysiscachetask).

## H→μμ

A single-Higgs analysis with the leanest submodule set (no `StatInference`/`inference`). Its CI
runs over **all** Run 3 eras (`H_mumu_eras: ALL`), and its CI process names are lower-case
(`custom_CI_signal`, …) — see [Processes & models](configuration/processes-and-models.md).
45 changes: 0 additions & 45 deletions docs/analysis.md

This file was deleted.

67 changes: 67 additions & 0 deletions docs/ci/github-actions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# GitHub Actions

FLAF uses **two** continuous-integration systems:

| System | Where | Purpose |
|---|---|---|
| **GitHub Actions** | GitHub | Fast code-quality and sanity checks on every pull request. |
| **FLAF integration** | GitLab CI (CERN) | The full pipeline run that checks physics correctness. Triggered by a bot comment — see [Integration pipeline](integration-pipeline.md). |

This page covers the GitHub Actions checks.

## Shared, reusable workflows

The analysis repositories don't duplicate CI logic. Each workflow is a thin wrapper that calls the
shared implementation in FLAF:

```yaml
jobs:
my-job:
uses: cms-flaf/FLAF/.github/workflows/<workflow>.yaml@main
secrets: inherit
```

So fixing a check in FLAF fixes it everywhere. (A checkout helper inside the shared workflows makes
the FLAF tooling — `.yamllint`, `.clang-format` — available even though FLAF is a submodule.)

## The standard checks

| Workflow | Runs on | What it checks |
|---|---|---|
| `formatting-check.yaml` | PRs | Code style: **flake8**/black (Python), **clang-format** (C++), **yamllint** (YAML). |
| `repo-sanity-checks.yaml` | PRs | Submodule-pointer consistency, repository health, no stray binary files. |
| `test-setup-loading.yaml` | PRs | Actually loads `Setup.py` for **every configured era** — catches config typos and broken references early (a real run, not a dry run). |
| `trigger-flaf-integration.yaml` | PR comments | Parses a `@cms-flaf-bot` comment and triggers the GitLab pipeline. See [Integration pipeline](integration-pipeline.md). |

FLAF itself additionally runs:

| Workflow | What it checks |
|---|---|
| `cross-section-check.yaml` | Cross-section values are consistent/valid. |
| `ds-consistency-check.yaml` | `datasets.yaml` entries are well-formed (generator, resolvable cross-section, naming) via `test/checkDatasetConfigConsistency.py`. |

## Passing the checks before you push

Formatting is enforced, so format **before** committing. The convenience script applies all
formatters at once (with `flaf_env` active):

```sh
bash run_tools/apply_format.sh
```

Or run them individually:

```sh
black <file.py> # Python
clang-format -i --style "file:.clang-format" <f> # C++
yamllint -s -c .yamllint <file.yaml> # YAML
```

If you edited `datasets.yaml`, also run the consistency check from
[Datasets](../configuration/datasets.md#validate-the-dataset-config). See
[Contributing](../contributing.md) for the full pre-PR checklist.

!!! note "Required secrets"
The bot-trigger workflow needs the org-level secrets `FLAF_INTEGRATION_TOKEN` (GitLab trigger)
and `FLAF_GITHUB_TOKEN` (to post the reply comment), inherited via `secrets: inherit`. The
quality checks need no secrets.
103 changes: 103 additions & 0 deletions docs/ci/integration-pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Integration pipeline

The **FLAF integration pipeline** runs the actual analysis pipeline end-to-end (on tiny test
inputs) to check that a change produces correct results — not just that it is well formatted. It
runs on **GitLab CI at CERN** (project
[`cms-flaf/flaf_integration`](https://gitlab.cern.ch/cms-flaf/flaf_integration), project id
`210600`) and is triggered from GitHub by a bot comment.

## Triggering it: `@cms-flaf-bot please test`

On a pull request (in a repo that supports it), an authorised user posts a comment:

```text
@cms-flaf-bot please test
```

The `trigger-flaf-integration.yaml` workflow then:

1. checks the commenter is in `authorized_users` and the header is recognised;
2. reads `.github/integration_cfg.yaml` **from the PR's branch**;
3. substitutes the PR's own version (so the pipeline tests *this* PR);
4. triggers the GitLab pipeline and posts back a `[pipeline#…] started` comment (or a 👎 reaction if
it could not start).

Repos with the trigger enabled: HH_bbtautau, HH_bbWW, H_mumu, FLAF, Corrections, StatInference.

!!! tip "Test a change that spans repositories"
Add lines to point a dependency at your PR or branch, e.g.:
```text
@cms-flaf-bot please test
- https://github.com/cms-flaf/FLAF/pull/272
```
Shorthands include `- <repo>_version=PR_<n>`, a `…/pull/<n>` URL, a `…/tree/<branch>` URL, and
`- gitlab_branch=<branch>` to run a non-default `flaf_integration` branch.

## `integration_cfg.yaml`

Each participating repo has `.github/integration_cfg.yaml`. It lists who may trigger, the accepted
comment headers, and the **variables** passed to the pipeline:

```yaml
variables:
HH_bbtautau_version: "main"
FLAF_version: "default" # "default" = keep flaf_integration's current value
Corrections_version: "default"
HH_bbtautau_active: "1" # "1" = run this analysis, "0" = skip
HH_bbtautau_task: "FLAF.Analysis.tasks.HistPlotTask"
HH_bbtautau_args: "--branches 0 --test 1000"
HH_bbtautau_eras: "Run3_2022 Run3_2022EE Run3_2023 Run3_2023BPix"
HH_bbtautau_processes: "custom_CI_Signal custom_CI_Background custom_CI_Data"
TEST_TIMEOUT: "4h"
```

| Variable | Meaning |
|---|---|
| `<ana>_active` | Whether to run that analysis (`1`/`0`). |
| `<ana>_version` / `<pkg>_version` | Which version of a repo to use; `default` keeps the pipeline's current value. |
| `<ana>_task` | The target task (the pipeline runs everything up to it). |
| `<ana>_args` | Extra `law run` arguments (e.g. `--branches 0 --test 1000`). |
| `<ana>_eras` | Eras to test (space-separated, or `ALL`). |
| `<ana>_processes` | The processes to test (space-separated). **Required** for an active analysis — there is no default. |

!!! warning "`<ana>_processes` must be set for an active analysis"
The pipeline **errors at generation time** if an active analysis has no `processes`. The values
live in each repo's `integration_cfg.yaml` (capitalised for HH analyses, lower-case for H→μμ —
see [Processes & models](../configuration/processes-and-models.md)). They are declared but left
empty in `flaf_integration/.gitlab-ci.yml`, so the trigger accepts them while the real values
come from the triggering repo.

### Root packages vs packages

The shared trigger logic distinguishes:

- **root packages** — repos with an `_active` variable (the analyses: HH_bbtautau, HH_bbWW,
H_mumu);
- **packages** — repos with a `_version` but no `_active` (FLAF, Corrections, StatInference).

Both may trigger the pipeline; the distinction matters only when editing the trigger logic.

## What the pipeline does

```mermaid
flowchart LR
P[Parent pipeline<br/>.gitlab-ci.yml] -->|generate_child_pipeline.py| C[Child pipeline]
C --> B[build: per analysis]
B --> T1[test_dataset:<br/>per process]
T1 --> T2[test_era / test_multi_era]
T2 --> N[notify GitHub]
```

- The **parent** pipeline runs `scripts/generate_child_pipeline.py`, which expands the active
analyses × eras × processes into concrete jobs (pure Python, no PyYAML on the runner).
- The **child** pipeline builds each active analysis once, then runs the requested task per
process/era on tiny inputs (`--test`), and finally notifies GitHub of success/failure.
- Disabled analyses/eras are simply not emitted; jobs are non-interruptible so parallel pipelines
on the same branch don't cancel each other.

## Reproducing CI locally

You can run what a CI job runs without the bot — point `fs_default` at a local path, use
`phys_model: TestModel` and `--test 1000`, and launch the target task with `--workflow local`. See
[Your first run](../getting-started/first-run.md) and the
[`user_custom.yaml` guide](../configuration/user-custom.md).
Loading
Loading